Age | Commit message (Collapse) | Author |
|
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (100 commits)
ide: move hwif_register() call out of ide_probe_port()
ide: factor out code for tuning devices from ide_probe_port()
ide: move handling of I/O resources out of ide_probe_port()
ide: make probe_hwif() return an error value
ide: use ide_remove_port_from_hwgroup in init_irq()
ide: prepare init_irq() for using ide_remove_port_from_hwgroup()
ide: factor out code removing port from hwgroup from ide_unregister()
ide: I/O resources are released too early in ide_unregister()
ide: cleanup ide_system_bus_speed()
ide: remove needless zeroing of hwgroup fields from init_irq()
ide: remove unused ide_hwgroup_t fields
ide_platform: remove struct hwif_prop
ide: remove hwif->present manipulations from hwif_init()
ide: move wait_hwif_ready() documentation in the right place
ide: fix handling of busy I/O resources in probe_hwif()
<linux/hdsmart.h> is not used by kernel code
ide: don't include <linux/hdsmart.h>
ide-floppy: cleanup header
ide: update/add my Copyrights
ide: delete filenames/versions from comments
...
|
|
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
IDE doesn't need it.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Make ide_build_sglist() and ide_destroy_dmatable() available also when
CONFIG_BLK_DEV_IDEDMA_PCI=n.
* Use ide_build_sglist() and ide_destroy_dmatable() in {ics,au1xxx-}ide.c
and remove no longer needed {ics,au}ide_build_sglist().
There should be no functionality changes caused by this patch.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Setup hwif->dev in au_ide_probe().
* Use hwif->dev instead of ahwif->dev in auide_build_sglist(),
auide_build_dmatable(), auide_dma_end() and auide_ddma_init().
* Remove no longer needed 'dev' field from _auide_hwif type.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
Keep pointer to struct device instead of struct pci_dev in ide_hwif_t.
While on it:
* Use *dev->dma_mask instead of pci_dev->dma_mask in ide_toggle_bounce().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Add IDE_HFLAG_NO_DSC host flag for hosts that doesn't support DSC overlap.
* Set it in aec62xx (for ATP850UF only) and hpt34x host drivers.
* Convert ide-tape device driver to check for IDE_HFLAG_NO_DSC flag.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Rename 'simplex_stat' variable to 'dma_stat' in ide_get_or_set_dma_base().
* Factor out code for forcing host out of "simplex" mode from
ide_get_or_set_dma_base() to ide_pci_clear_simplex() helper.
* Add IDE_HFLAG_CLEAR_SIMPLEX host flag and set it in alim15x3 (for M5229),
amd74xx (for AMD 7409), cmd64x (for CMD643), generic (for Netcell) and
serverworks (for CSB5) host drivers.
* Make ide_get_or_set_dma_base() test for IDE_HFLAG_CLEAR_SIMPLEX host flag
instead of checking dev->device (BTW the code was buggy because it didn't
check for dev->vendor, luckily none of these PCI Device IDs was used by
some other vendor for PCI IDE controller).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
According to http://marc.info/?l=linux-ide&m=114346138611631, the drivers must
always register 8 DMA ports with ide_setup_dma(), so its last argument is not
needed. While at it, kill some useless parens in that function...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Add ide_dump_identify() debug helper for dumping raw identify data in
the hdparm friendly format (== the identify data can be extracted from
dmesg output and passed to hdparm --Istdin).
* Dump identify data in ide-probe.c::do_identify() if DEBUG is enabled.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
* Move lba_to_msf() and msf_to_lba() to <linux/cdrom.h>
(use 'u8' type instead of 'byte' while at it).
* Remove msf_to_lba() copy from drivers/cdrom/cdrom.c.
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
After commit 7267c3377443322588cddaf457cf106839a60463
wait_drive_not_busy() can become static again.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
- ide_scan_pcibus() can become static
- instead of ide_scan_pci() we can use ide_scan_pcibus() directly
in module_init()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b46' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message
[AUDIT] ratelimit printk messages audit
[patch 2/2] audit: complement va_copy with va_end()
[patch 1/2] kernel/audit.c: warning fix
[AUDIT] create context if auditing was ever enabled
[AUDIT] clean up audit_receive_msg()
[AUDIT] make audit=0 really stop audit messages
[AUDIT] break large execve argument logging into smaller messages
[AUDIT] include audit type in audit message when using printk
[AUDIT] do not panic on exclude messages in audit_log_pid_context()
[AUDIT] Add End of Event record
[AUDIT] add session id to audit messages
[AUDIT] collect uid, loginuid, and comm in OBJ_PID records
[AUDIT] return EINTR not ERESTART*
[PATCH] get rid of loginuid races
[PATCH] switch audit_get_loginuid() to task_struct *
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: avoid section mismatch involving arch_register_cpu
x86: fixes for lookup_address args
x86: fix sparse warnings in cpu/common.c
x86: make early_console static in early_printk.c
x86: remove unneeded round_up
x86: fix section mismatch warning in kernel/pci-calgary
x86: fix section mismatch warning in acpi/boot.c
x86: fix section mismatch warnings when referencing notifiers
x86: silence section mismatch warning in smpboot_64.c
x86: fix comments in vmlinux_64.lds
x86_64: make bootmap_start page align v6
x86_64: add debug name for early_res
|
|
execve arguments can be quite large. There is no limit on the number of
arguments and a 4G limit on the size of an argument.
this patch prints those aruguments in bite sized pieces. a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k
single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.
Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Keeping loginuid in audit_context is racy and results in messier
code. Taken to task_struct, out of the way of ->audit_context
changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
all callers pass something->audit_context
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Avoid section mismatch involving arch_register_cpu.
Marking arch_register_cpu as __init and removing the export
for non-hotplug-cpu configurations makes the following warning
go away:
Section mismatch in reference from the function
arch_register_cpu() to the function .devinit.text:register_cpu()
The function arch_register_cpu() references
the function __devinit register_cpu().
This is often because arch_register_cpu lacks a __devinit
annotation or the annotation of register_cpu is wrong.
The only external user of arch_register_cpu in the tree is
in drivers/acpi/processor_core.c where it is guarded by
ACPI_HOTPLUG_CPU (which depends on HOTPLUG_CPU).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
boot oopses when a system has 64 or 128 GB of RAM installed:
Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12
Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!
it turns out some variables near end of bss are corrupted already.
in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end
and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100
setup_node_bootmem() makes NODE_DATA cacheline aligned,
and bootmap is page-aligned.
the patch updates find_e820_area() to make sure we can meet
the alignment constraints.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
helps debugging problems in this rather murky area of code.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.
This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.
FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.
FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.
The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.
Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.
Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The exception fixup for the futex macros __futex_atomic_op1/2 and
futex_atomic_cmpxchg_inatomic() is missing an entry when the lock
prefix is replaced by a NOP via SMP alternatives.
Chuck Ebert tracked this down from the information provided in:
https://bugzilla.redhat.com/show_bug.cgi?id=429412
A possible solution would be to add another fixup after the
LOCK_PREFIX, so both the LOCK and NOP case have their own entry in the
exception table, but it's not really worth the trouble.
Simply replace LOCK_PREFIX with lock and keep those untouched by SMP
alternatives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.
Fixup the obvious places, which miss this update.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill swap_io_context()
as-iosched: fix inconsistent ioc->lock context
ide-cd: fix leftover data BUG
block: make elevator lib checkpatch compliant
cfq-iosched: make checkpatch compliant
block: make core bits checkpatch compliant
block: new end request handling interface should take unsigned byte counts
unexport add_disk_randomness
block/sunvdc.c:print_version() must be __devinit
splice: always updated atime in direct splice
|
|
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.
There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (173 commits)
[NETNS]: Lookup in FIB semantic hashes taking into account the namespace.
[NETNS]: Add a namespace mark to fib_info.
[IPV4]: fib_sync_down rework.
[NETNS]: Process interface address manipulation routines in the namespace.
[IPV4]: Small style cleanup of the error path in rtm_to_ifaddr.
[IPV4]: Fix memory leak on error path during FIB initialization.
[NETFILTER]: Ipv6-related xt_hashlimit compilation fix.
[NET_SCHED]: Add flow classifier
[NET_SCHED]: sch_sfq: make internal queues visible as classes
[NET_SCHED]: sch_sfq: add support for external classifiers
[NET_SCHED]: Constify struct tcf_ext_map
[BLUETOOTH]: Fix bugs in previous conn add/del workqueue changes.
[TCP]: Unexport sysctl_tcp_tso_win_divisor
[IPV4]: Make struct ipv4_devconf static.
[TR] net/802/tr.c: sysctl_tr_rif_timeout static
[XFRM]: Fix statistics.
[XFRM]: Remove unused exports.
[PKT_SCHED] sch_teql.c: Duplicate IFF_BROADCAST in FMASK, remove 2nd.
[BNX2]: Fix ASYM PAUSE advertisement for remote PHY.
[IPV4] route cache: Introduce rt_genid for smooth cache invalidation
...
|
|
Remove unused CONFIG_DISKtel define.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix problems with the 528x ColdFire CPU cache setup.
Do not cache the flash region (if present), and make the runtime
settings consistent with the init setting.
Problems pointed out by Bernd Buttner <b.buettner@mkc-gmbh.de>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No point in passing signed integers as the byte count, they can never
be negative.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.
Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...
Some examples:
- Classic SFQ hash:
tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024
- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:
tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024
- Map destination IPs of 192.168.0.0/24 to classids 1-257:
tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256
- alternatively:
tc filter add ... flow map \
key dst and 0xff
- similar, but reverse ordered:
tc filter add ... flow map \
key dst and 0xff xor 0xff
Perturbation is currently not supported because we can't reliable kill the
timer on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for dumping statistics and make internal queues visible as
classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct ipv4_devconf can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o Outbound sequence number overflow error status
is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
mapped to XfrmInStateInvalid then this patch fiex the error
to XfrmInStateMismatch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current ip route cache implementation is not suited to large caches.
We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.
A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.
Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.
This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
|
|
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are
then used to implement dev_unicast_sync/unsync as well.
I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11. When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour. This means the dump
will end up skipping some routes which matches what FIB_HASH does.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.
The dccp and inet_diag, which use that lookup functions
pass the init_net into them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.
A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.
This saves half-a-kilo for ipv4 and ipv6.
add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
function old new delta
__inet_hash_connect - 577 +577
arp_ignore 108 113 +5
static.hint 8 4 -4
rt_worker_func 376 372 -4
inet6_hash_connect 584 25 -559
inet_hash_connect 586 25 -561
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|