summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2011-03-27NFS: Ensure that rpc_release_resources_task() can be called twice.OGAWA Hirofumi
BUG: atomic_dec_and_test(): -1: atomic counter underflow at: Pid: 2827, comm: mount.nfs Not tainted 2.6.38 #1 Call Trace: [<ffffffffa02223a0>] ? put_rpccred+0x44/0x14e [sunrpc] [<ffffffffa021bbe9>] ? rpc_ping+0x4e/0x58 [sunrpc] [<ffffffffa021c4a5>] ? rpc_create+0x481/0x4fc [sunrpc] [<ffffffffa022298a>] ? rpcauth_lookup_credcache+0xab/0x22d [sunrpc] [<ffffffffa028be8c>] ? nfs_create_rpc_client+0xa6/0xeb [nfs] [<ffffffffa028c660>] ? nfs4_set_client+0xc2/0x1f9 [nfs] [<ffffffffa028cd3c>] ? nfs4_create_server+0xf2/0x2a6 [nfs] [<ffffffffa0295d07>] ? nfs4_remote_mount+0x4e/0x14a [nfs] [<ffffffff810dd570>] ? vfs_kern_mount+0x6e/0x133 [<ffffffffa029605a>] ? nfs_do_root_mount+0x76/0x95 [nfs] [<ffffffffa029643d>] ? nfs4_try_mount+0x56/0xaf [nfs] [<ffffffffa0297434>] ? nfs_get_sb+0x435/0x73c [nfs] [<ffffffff810dd59b>] ? vfs_kern_mount+0x99/0x133 [<ffffffff810dd693>] ? do_kern_mount+0x48/0xd8 [<ffffffff810f5b75>] ? do_mount+0x6da/0x741 [<ffffffff810f5c5f>] ? sys_mount+0x83/0xc0 [<ffffffff8100293b>] ? system_call_fastpath+0x16/0x1b Well, so, I think this is real bug of nfs codes somewhere. With some review, the code rpc_call_sync() rpc_run_task rpc_execute() __rpc_execute() rpc_release_task() rpc_release_resources_task() put_rpccred() <= release cred rpc_put_task rpc_do_put_task() rpc_release_resources_task() put_rpccred() <= release cred again seems to be release cred unintendedly. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits) route: Take the right src and dst addresses in ip_route_newports ipv4: Fix nexthop caching wrt. scoping. ipv4: Invalidate nexthop cache nh_saddr more correctly. net: fix pch_gbe section mismatch warning ipv4: fix fib metrics mlx4_en: Removing HW info from ethtool -i report. net_sched: fix THROTTLED/RUNNING race drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region bonding: fix rx_handler locking myri10ge: fix rmmod crash mlx4_en: updated driver version to 1.5.4.1 mlx4_en: Using blue flame support mlx4_core: reserve UARs for userspace consumers mlx4_core: maintain available field in bitmap allocator mlx4: Add blue flame support for kernel consumers mlx4_en: Enabling new steering mlx4: Add support for promiscuous mode in the new steering model. mlx4: generalization of multicast steering. mlx4_en: Reporting HW revision in ethtool -i ...
2011-03-25Merge branch 'nfs-for-2.6.39' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (28 commits) Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO NFSv4.1 convert layoutcommit sync to boolean NFSv4.1 pnfs_layoutcommit_inode fixes NFS: Determine initial mount security NFS: use secinfo when crossing mountpoints NFS: Add secinfo procedure NFS: lookup supports alternate client NFS: convert call_sync() to a function NFSv4.1 remove temp code that prevented ds commits NFSv4.1: layoutcommit NFSv4.1: filelayout driver specific code for COMMIT NFSv4.1: remove GETATTR from ds commits NFSv4.1: add generic layer hooks for pnfs COMMIT NFSv4.1: alloc and free commit_buckets NFSv4.1: shift filelayout_free_lseg NFSv4.1: pull out code from nfs_commit_release NFSv4.1: pull error handling out of nfs_commit_list NFSv4.1: add callback to nfs4_commit_done NFSv4.1: rearrange nfs_commit_rpcsetup NFSv4.1: don't send COMMIT to ds for data sync writes ...
2011-03-24ipv4: Fix nexthop caching wrt. scoping.David S. Miller
Move the scope value out of the fib alias entries and into fib_info, so that we always use the correct scope when recomputing the nexthop cached source address. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24ipv4: Invalidate nexthop cache nh_saddr more correctly.David S. Miller
Any operation that: 1) Brings up an interface 2) Adds an IP address to an interface 3) Deletes an IP address from an interface can potentially invalidate the nh_saddr value, requiring it to be recomputed. Perform the recomputation lazily using a generation ID. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24Merge branch 'nfs-for-2.6.39' into nfs-for-nextTrond Myklebust
2011-03-24ipv4: fix fib metricsEric Dumazet
Alessandro Suardi reported that we could not change route metrics : ip ro change default .... advmss 1400 This regression came with commit 9c150e82ac50 (Allocate fib metrics dynamically). fib_metrics is no longer an array, but a pointer to an array. Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24NFS: Determine initial mount securityBryan Schumaker
When sec=<something> is not presented as a mount option, we should attempt to determine what security flavor the server is using. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFS: use secinfo when crossing mountpointsBryan Schumaker
A submount may use different security than the parent mount does. We should figure out what sec flavor the submount uses at mount time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: SUNRPC: Remove resource leak in svc_rdma_send_error() nfsd: wrong index used in inner loop nfsd4: fix comment and remove unused nfsd4_file fields nfs41: make sure nfs server return right ca_maxresponsesize_cached nfsd: fix compile error svcrpc: fix bad argument in unix_domain_find nfsd4: fix struct file leak nfsd4: minor nfs4state.c reshuffling svcrpc: fix rare race on unix_domain creation nfsd41: modify the members value of nfsd4_op_flags nfsd: add proc file listing kernel's gss_krb5 enctypes gss:krb5 only include enctype numbers in gm_upcall_enctypes NFSD, VFS: Remove dead code in nfsd_rename() nfsd: kill unused macro definition locks: use assign_type()
2011-03-23rds: use little-endian bitopsAkinobu Mita
As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andy Grover <andy.grover@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23rds: stop including asm-generic/bitops/le.h directlyAkinobu Mita
asm-generic/bitops/le.h is only intended to be included directly from asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h which implements generic ext2 or minix bit operations. This stops including asm-generic/bitops/le.h directly and use ext2 non-atomic bit operations instead. It seems odd to use ext2_*_bit() on rds, but it will replaced with __{set,clear,test}_bit_le() after introducing little endian bit operations for all architectures. This indirect step is necessary to maintain bisectability for some architectures which have their own little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andy Grover <andy.grover@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23ipv4: fix ip_rt_update_pmtu()Eric Dumazet
commit 2c8cec5c10bc (Cache learned PMTU information in inetpeer) added an extra inet_putpeer() call in ip_rt_update_pmtu(). This results in various problems, since we can free one inetpeer, while it is still in use. Ref: http://www.spinics.net/lists/netdev/msg159121.html Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23ipv4: Fallback to FIB local table in __ip_dev_find().David S. Miller
In commit 9435eb1cf0b76b323019cebf8d16762a50a12a19 ("ipv4: Implement __ip_dev_find using new interface address hash.") we reimplemented __ip_dev_find() so that it doesn't have to do a full FIB table lookup. Instead, it consults a hash table of addresses configured to interfaces. This works identically to the old code in all except one case, and that is for loopback subnets. The old code would match the loopback device for any IP address that falls within a subnet configured to the loopback device. Handle this corner case by doing the FIB lookup. We could implement this via inet_addr_onlink() but: 1) Someone could configure many addresses to loopback and inet_addr_onlink() is a simple list traversal. 2) We know the old code works. Reported-by: Julian Anastasov <ja@ssi.bg> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22tcp: Make undo_ssthresh arg to tcp_undo_cwr() a bool.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22tcp: avoid cwnd moderation in undoYuchung Cheng
In the current undo logic, cwnd is moderated after it was restored to the value prior entering fast-recovery. It was moderated first in tcp_try_undo_recovery then again in tcp_complete_cwr. Since the undo indicates recovery was false, these moderations are not necessary. If the undo is triggered when most of the outstanding data have been acknowledged, the (restored) cwnd is falsely pulled down to a small value. This patch removes these cwnd moderations if cwnd is undone a) during fast-recovery b) by receiving DSACKs past fast-recovery Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22bridge: Fix possibly wrong MLD queries' ethernet source addressLinus Lüssing
The ipv6_dev_get_saddr() is currently called with an uninitialized destination address. Although in tests it usually seemed to nevertheless always fetch the right source address, there seems to be a possible race condition. Therefore this commit changes this, first setting the destination address and only after that fetching the source address. Reported-by: Jan Beulich <JBeulich@novell.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv6: ip6_route_output does not modify sk parameter, so make it constFlorian Westphal
This avoids explicit cast to avoid 'discards qualifiers' compiler warning in a netfilter patch that i've been working on. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22kthread: use kthread_create_on_node()Eric Dumazet
ksoftirqd, kworker, migration, and pktgend kthreads can be created with kthread_create_on_node(), to get proper NUMA affinities for their stack and task_struct. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: [net/9p]: Introduce basic flow-control for VirtIO transport. 9p: use the updated offset given by generic_write_checks [net/9p] Don't re-pin pages on retrying virtqueue_add_buf(). [net/9p] Set the condition just before waking up. [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring fs/9p: Add v9fs_dentry2v9ses fs/9p: Attach writeback_fid on first open with WR flag fs/9p: Open writeback fid in O_SYNC mode fs/9p: Use truncate_setsize instead of vmtruncate net/9p: Fix compile warning net/9p: Convert the in the 9p rpc call path to GFP_NOFS fs/9p: Fix race in initializing writeback fid
2011-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use watch/notify for changes in rbd header libceph: add lingering request and watch/notify event framework rbd: update email address in Documentation ceph: rename dentry_release -> d_release, fix comment ceph: add request to the tail of unsafe write list ceph: remove request from unsafe list if it is canceled/timed out ceph: move readahead default to fs/ceph from libceph ceph: add ino32 mount option ceph: update common header files ceph: remove debugfs debug cruft libceph: fix osd request queuing on osdmap updates ceph: preserve I_COMPLETE across rename libceph: Fix base64-decoding when input ends in newline.
2011-03-22SUNRPC: Never reuse the socket port after an xs_close()Trond Myklebust
If we call xs_close(), we're in one of two situations: - Autoclose, which means we don't expect to resend a request - bind+connect failed, which probably means the port is in use Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2011-03-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2011-03-22[net/9p]: Introduce basic flow-control for VirtIO transport.Venkateswararao Jujjuri (JV)
Recent zerocopy work in the 9P VirtIO transport maps and pins user buffers into kernel memory for the server to work on them. Since the user process can initiate this kind of pinning with a simple read/write call, thousands of IO threads initiated by the user process can hog the system resources and could result into denial of service. This patch introduces flow control to avoid that extreme scenario. The ceiling limit to avoid denial of service attacks is set to relatively high (nr_free_pagecache_pages()/4) so that it won't interfere with regular usage, but can step in extreme cases to limit the total system hang. Since we don't have a global structure to accommodate this variable, I choose the virtio_chan as the home for this. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().Venkateswararao Jujjuri (JV)
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] Set the condition just before waking up.Venkateswararao Jujjuri (JV)
Given that the sprious wake-ups are common, we need to move the condition setting right next to the wake_up(). After setting the condition to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the virtqueue back on the free list for someone else to use. This may result in kernel panic while relasing the pinned pages in p9_release_req_pages(). Also rearranged the while loop in req_done() for better redability. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] unconditional wake_up to proc waiting for space on VirtIO ringVenkateswararao Jujjuri (JV)
Process may wait to get space on VirtIO ring to send a transaction to VirtFS server. Current code just does a conditional wake_up() which means only one process will be woken up even if multiple processes are waiting. This fix makes the wake_up unconditional. Hence we won't have any processes waiting for-ever. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22net/9p: Fix compile warningAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22net/9p: Convert the in the 9p rpc call path to GFP_NOFSAneesh Kumar K.V
Without this we can cause reclaim allocation in writepage. [ 3433.448430] ================================= [ 3433.449117] [ INFO: inconsistent lock state ] [ 3433.449117] 2.6.38-rc5+ #84 [ 3433.449117] --------------------------------- [ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. [ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3433.449117] (iprune_sem){+++++-}, at: [<ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1 [ 3433.449117] {RECLAIM_FS-ON-W} state was registered at: [ 3433.449117] [<ffffffff8107fe5f>] mark_held_locks+0x52/0x70 [ 3433.449117] [<ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f [ 3433.449117] [<ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c [ 3433.449117] [<ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2 [ 3433.449117] [<ffffffff8127be77>] idr_pre_get+0x2d/0x6f [ 3433.449117] [<ffffffff815434eb>] p9_idpool_get+0x30/0xae [ 3433.449117] [<ffffffff81540123>] p9_client_rpc+0xd7/0x9b0 [ 3433.449117] [<ffffffff815427b0>] p9_client_clunk+0x88/0xdb [ 3433.449117] [<ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48 [ 3433.449117] [<ffffffff810eb511>] evict+0x1f/0x87 [ 3433.449117] [<ffffffff810eb5c0>] dispose_list+0x47/0xe3 [ 3433.449117] [<ffffffff810eb8da>] evict_inodes+0x138/0x14f [ 3433.449117] [<ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8 [ 3433.449117] [<ffffffff810d91e8>] kill_anon_super+0x11/0x50 [ 3433.449117] [<ffffffff811d4951>] v9fs_kill_super+0x49/0xab [ 3433.449117] [<ffffffff810d926e>] deactivate_locked_super+0x21/0x46 [ 3433.449117] [<ffffffff810d9e84>] deactivate_super+0x40/0x44 [ 3433.449117] [<ffffffff810ef848>] mntput_no_expire+0x100/0x109 [ 3433.449117] [<ffffffff810f0aeb>] sys_umount+0x2f1/0x31c [ 3433.449117] [<ffffffff8102c87b>] system_call_fastpath+0x16/0x1b [ 3433.449117] irq event stamp: 192941 [ 3433.449117] hardirqs last enabled at (192941): [<ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30 [ 3433.449117] hardirqs last disabled at (192940): [<ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5 [ 3433.449117] softirqs last enabled at (188470): [<ffffffff8105fd65>] __do_softirq+0x133/0x152 [ 3433.449117] softirqs last disabled at (188455): [<ffffffff8102d7cc>] call_softirq+0x1c/0x28 [ 3433.449117] [ 3433.449117] other info that might help us debug this: [ 3433.449117] 1 lock held by kswapd0/505: [ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810b52e2>] shrink_slab+0x38/0x15f [ 3433.449117] [ 3433.449117] stack backtrace: [ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84 [ 3433.449117] Call Trace: [ 3433.449117] [<ffffffff8107fbce>] ? valid_state+0x17e/0x191 [ 3433.449117] [<ffffffff81036896>] ? save_stack_trace+0x28/0x45 [ 3433.449117] [<ffffffff81080426>] ? check_usage_forwards+0x0/0x87 [ 3433.449117] [<ffffffff8107fcf4>] ? mark_lock+0x113/0x22c [ 3433.449117] [<ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7 [ 3433.449117] [<ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c [ 3433.449117] [<ffffffff81081077>] ? __lock_acquire+0x392/0xcf7 [ 3433.449117] [<ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28 [ 3433.449117] [<ffffffff81081a33>] ? lock_acquire+0x57/0x6d [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff81567d85>] ? down_read+0x47/0x5c [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810b5385>] ? shrink_slab+0xdb/0x15f [ 3433.449117] [<ffffffff810b69bc>] ? kswapd+0x574/0x96a [ 3433.449117] [<ffffffff810b6448>] ? kswapd+0x0/0x96a [ 3433.449117] [<ffffffff810714e2>] ? kthread+0x7d/0x85 [ 3433.449117] [<ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10 [ 3433.449117] [<ffffffff81569200>] ? restore_args+0x0/0x30 [ 3433.449117] [<ffffffff81071465>] ? kthread+0x0/0x85 [ 3433.449117] [<ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22libceph: add lingering request and watch/notify event frameworkYehuda Sadeh
Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22ipv4: optimize route adding on secondary promotionJulian Anastasov
Optimize the calling of fib_add_ifaddr for all secondary addresses after the promoted one to start from their place, not from the new place of the promoted secondary. It will save some CPU cycles because we are sure the promoted secondary was first for the subnet and all next secondaries do not change their place. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: remove the routes on secondary promotionJulian Anastasov
The secondary address promotion relies on fib_sync_down_addr to remove all routes created for the secondary addresses when the old primary address is deleted. It does not happen for cases when the primary address is also in another subnet. Fix that by deleting local and broadcast routes for all secondaries while they are on device list and by faking that all addresses from this subnet are to be deleted. It relies on fib_del_ifaddr being able to ignore the IPs from the concerned subnet while checking for duplication. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: fix route deletion for IPs on many subnetsJulian Anastasov
Alex Sidorenko reported for problems with local routes left after IP addresses are deleted. It happens when same IPs are used in more than one subnet for the device. Fix fib_del_ifaddr to restrict the checks for duplicate local and broadcast addresses only to the IFAs that use our primary IFA or another primary IFA with same address. And we expect the prefsrc to be matched when the routes are deleted because it is possible they to differ only by prefsrc. This patch prevents local and broadcast routes to be leaked until their primary IP is deleted finally from the box. As the secondary address promotion needs to delete the routes for all secondaries that used the old primary IFA, add option to ignore these secondaries from the checks and to assume they are already deleted, so that we can safely delete the route while these IFAs are still on the device list. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: match prefsrc when deleting routesJulian Anastasov
fib_table_delete forgets to match the routes by prefsrc. Callers can specify known IP in fc_prefsrc and we should remove the exact route. This is needed for cases when same local or broadcast addresses are used in different subnets and the routes differ only in prefsrc. All callers that do not provide fc_prefsrc will ignore the route prefsrc as before and will delete the first occurence. That is how the ip route del default magic works. Current callers are: - ip_rt_ioctl where rtentry_to_fib_config provides fc_prefsrc only when the provided device name matches IP label with colon. - inet_rtm_delroute where RTA_PREFSRC is optional too - fib_magic which deals with routes when deleting addresses and where the fc_prefsrc is always set with the primary IP for the concerned IFA. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22net: implement dev_disable_lro() hw_features compatibilityMichał Mirosław
Implement compatibility with new hw_features for dev_disable_lro(). This is a transition path - dev_disable_lro() should be later integrated into netdev_fix_features() after all drivers are converted. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21IPVS: Use global mutex in ip_vs_app.cSimon Horman
As part of the work to make IPVS network namespace aware __ip_vs_app_mutex was replaced by a per-namespace lock, ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes. Unfortunately this implementation results in ipvs->app_key residing in non-static storage which at the very least causes a lockdep warning. This patch takes the rather heavy-handed approach of reinstating __ip_vs_app_mutex which will cover access to the ipvs->list_head of all network namespaces. [ 12.610000] IPVS: Creating netns size=2456 id=0 [ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) [ 12.640000] BUG: key ffff880003bbf1a0 not in .data! [ 12.640000] ------------[ cut here ]------------ [ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570() [ 12.640000] Hardware name: Bochs [ 12.640000] Pid: 1, comm: swapper Tainted: G W 2.6.38-kexec-06330-g69b7efe-dirty #122 [ 12.650000] Call Trace: [ 12.650000] [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0 [ 12.650000] [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20 [ 12.650000] [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570 [ 12.650000] [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10 [ 12.650000] [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50 [ 12.650000] [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70 [ 12.650000] [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1c33>] T.620+0x43/0x170 [ 12.660000] [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.670000] [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40 [ 12.670000] [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12 [ 12.670000] [<ffffffff81685a87>] ip_vs_init+0x4c/0xff [ 12.670000] [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e [ 12.670000] [<ffffffff8166583e>] kernel_init+0x13e/0x1c2 [ 12.670000] [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10 [ 12.670000] [<ffffffff8128ad40>] ? restore_args+0x0/0x30 [ 12.680000] [<ffffffff81665700>] ? kernel_init+0x0/0x1c2 [ 12.680000] [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0 Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21ipvs: fix a typo in __ip_vs_control_init()Eric Dumazet
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.Eric W. Biederman
When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh by adding a mount point it appears I missed a critical ordering issue, in the ipv6 initialization. I had not realized that ipv6_sysctl_register is called at the very end of the ipv6 initialization and in particular after we call neigh_sysctl_register from ndisc_init. "neigh" needs to be initialized in ipv6_static_sysctl_register which is the first ipv6 table to initialized, and definitely before ndisc_init. This removes the weirdness of duplicate tables while still providing a "neigh" mount point which prevents races in sysctl unregistering. This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232 Reported-by: sunkan@zappa.cx Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21net: fix incorrect spelling in drop monitor protocolNeil Horman
It was pointed out to me recently that my spelling could be better :) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21net/appletalk: fix atalk_release use after freeArnd Bergmann
The BKL removal in appletalk introduced a use-after-free problem, where atalk_destroy_socket frees a sock, but we still release the socket lock on it. An easy fix is to take an extra reference on the sock and sock_put it when returning from atalk_release. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21ipx: fix ipx_release()Eric Dumazet
Commit b0d0d915d1d1a0 (remove the BKL) added a regression, because sock_put() can free memory while we are going to use it later. Fix is to delay sock_put() _after_ release_sock(). Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21l2tp: fix possible oops on l2tp_eth module unloadJames Chapman
A struct used in the l2tp_eth driver for registering network namespace ops was incorrectly marked as __net_initdata, leading to oops when module unloaded. BUG: unable to handle kernel paging request at ffffffffa00ec098 IP: [<ffffffff8123dbd8>] ops_exit_list+0x7/0x4b PGD 142d067 PUD 1431063 PMD 195da8067 PTE 0 Oops: 0000 [#1] SMP last sysfs file: /sys/module/l2tp_eth/refcnt Call Trace: [<ffffffff8123dc94>] ? unregister_pernet_operations+0x32/0x93 [<ffffffff8123dd20>] ? unregister_pernet_device+0x2b/0x38 [<ffffffff81068b6e>] ? sys_delete_module+0x1b8/0x222 [<ffffffff810c7300>] ? do_munmap+0x254/0x318 [<ffffffff812c64e5>] ? page_fault+0x25/0x30 [<ffffffff812c6952>] ? system_call_fastpath+0x16/0x1b Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21xfrm: Fix initialize repl field of struct xfrm_stateWei Yongjun
Commit 'xfrm: Move IPsec replay detection functions to a separate file' (9fdc4883d92d20842c5acea77a4a21bb1574b495) introduce repl field to struct xfrm_state, and only initialize it under SA's netlink create path, the other path, such as pf_key, ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if the SA is created by pf_key, any input packet with SA's encryption algorithm will cause panic. int xfrm_input() { ... x->repl->advance(x, seq); ... } This patch fixed it by introduce new function __xfrm_init_state(). Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0 EIP is at xfrm_input+0x31c/0x4cc EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000 ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000) Stack: 00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0 00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000 dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32 Call Trace: [<c0786848>] xfrm4_rcv_encap+0x22/0x27 [<c0786868>] xfrm4_rcv+0x1b/0x1d [<c074ee56>] ip_local_deliver_finish+0x112/0x1b1 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074ef77>] ip_local_deliver+0x3e/0x44 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ec03>] ip_rcv_finish+0x30a/0x332 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074f188>] ip_rcv+0x20b/0x247 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c072797d>] __netif_receive_skb+0x373/0x399 [<c0727bc1>] netif_receive_skb+0x4b/0x51 [<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp] [<c072818f>] net_rx_action+0x9a/0x17d [<c0445b5c>] __do_softirq+0xa1/0x149 [<c0445abb>] ? __do_softirq+0x0/0x149 Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21libceph: fix osd request queuing on osdmap updatesSage Weil
If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21mac80211: initialize sta->last_rx in sta_info_allocFelix Fietkau
This field is used to determine the inactivity time. When in AP mode, hostapd uses it for kicking out inactive clients after a while. Without this patch, hostapd immediately deauthenticates a new client if it checks the inactivity time before the client sends its first data frame. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-20netfilter: ipt_CLUSTERIP: fix buffer overflowVasiliy Kulikov
'buffer' string is copied from userspace. It is not checked whether it is zero terminated. This may lead to overflow inside of simple_strtoul(). Changli Gao suggested to copy not more than user supplied 'size' bytes. It was introduced before the git epoch. Files "ipt_CLUSTERIP/*" are root writable only by default, however, on some setups permissions might be relaxed to e.g. network admin user. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20netfilter: xtables: fix reentrancyEric Dumazet
commit f3c5c1bfd4308 (make ip_tables reentrant) introduced a race in handling the stackptr restore, at the end of ipt_do_table() We should do it before the call to xt_info_rdunlock_bh(), or we allow cpu preemption and another cpu overwrites stackptr of original one. A second fix is to change the underflow test to check the origptr value instead of 0 to detect underflow, or else we allow a jump from different hooks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20netfilter: ipset: fix checking the type revision at create commandJozsef Kadlecsik
The revision of the set type was not checked at the create command: if the userspace sent a valid set type but with not supported revision number, it'd create a loop. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20netfilter: ipset: fix address ranges at hash:*port* typesJozsef Kadlecsik
The hash:*port* types with IPv4 silently ignored when address ranges with non TCP/UDP were added/deleted from the set and used the first address from the range only. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-18bridge: Reset IPCB when entering IP stack on NF_FORWARDHerbert Xu
Whenever we enter the IP stack proper from bridge netfilter we need to ensure that the skb is in a form the IP stack expects it to be in. The entry point on NF_FORWARD did not meet the requirements of the IP stack, therefore leading to potential crashes/panics. This patch fixes the problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>