summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2008-10-07tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.Daniele Lacamera
Because of rounding, in certain conditions, i.e. when in congestion avoidance state rho is smaller than 1/128 of the current cwnd, TCP Hybla congestion control starves and the cwnd is kept constant forever. This patch forces an increment by one segment after #send_cwnd calls without increments(newreno behavior). Signed-off-by: Daniele Lacamera <root@danielinux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: Fix netdev_run_todo dead-lockHerbert Xu
Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07tcp: Fix possible double-ack w/ user dmaAli Saidi
From: Ali Saidi <saidi@engin.umich.edu> When TCP receive copy offload is enabled it's possible that tcp_rcv_established() will cause two acks to be sent for a single packet. In the case that a tcp_dma_early_copy() is successful, copied_early is set to true which causes tcp_cleanup_rbuf() to be called early which can send an ack. Further along in tcp_rcv_established(), __tcp_ack_snd_check() is called and will schedule a delayed ACK. If no packets are processed before the delayed ack timer expires the packet will be acked twice. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: only invoke dev->change_rx_flags when device is UPPatrick McHardy
Jesper Dangaard Brouer <hawk@comx.dk> reported a bug when setting a VLAN device down that is in promiscous mode: When the VLAN device is set down, the promiscous count on the real device is decremented by one by vlan_dev_stop(). When removing the promiscous flag from the VLAN device afterwards, the promiscous count on the real device is decremented a second time by the vlan_change_rx_flags() callback. The root cause for this is that the ->change_rx_flags() callback is invoked while the device is down. The synchronization is meant to mirror the behaviour of the ->set_rx_mode callbacks, meaning the ->open function is responsible for doing a full sync on open, the ->close() function is responsible for doing full cleanup on ->stop() and ->change_rx_flags() is meant to do incremental changes while the device is UP. Only invoke ->change_rx_flags() while the device is UP to provide the intended behaviour. Tested-by: Jesper Dangaard Brouer <jdb@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-06netrom: Fix sock_orphan() use in nr_releaseJarek Poplawski
While debugging another bug it was found that NetRom socks are sometimes seen unorphaned in sk_free(). This patch moves sock_orphan() in nr_release() to the beginning (like in ax25, or rose). Reported-and-tested-by: Bernard Pidoux f6bvp <f6bvp@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-06ax25: Quick fix for making sure unaccepted sockets get destroyed.David S. Miller
Since we reverted 30902dc3cb0ea1cfc7ac2b17bcf478ff98420d74 ("ax25: Fix std timer socket destroy handling.") we have to put some kind of fix in to cure the issue whereby unaccepted connections do not get destroyed. The approach used here is from Tihomir Heidelberg - 9a4gl Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-06Revert "ax25: Fix std timer socket destroy handling."David S. Miller
This reverts commit 30902dc3cb0ea1cfc7ac2b17bcf478ff98420d74. It causes all kinds of problems, based upon a report by Bernard (f6bvp) and analysis by Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01af_key: Free dumping state on socket closeTimo Teras
Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while dumping is on-going. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachepArnaud Ebalard
ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to be initialized) when dst_alloc() is called from ip6_dst_blackhole(). Otherwise, it results in the following (xfrm_larval_drop is now set to 1 by default): [ 78.697642] Unable to handle kernel paging request for data at address 0x0000004c [ 78.703449] Faulting instruction address: 0xc0097f54 [ 78.786896] Oops: Kernel access of bad area, sig: 11 [#1] [ 78.792791] PowerMac [ 78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb [ 78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430 [ 78.809997] REGS: eef19ad0 TRAP: 0300 Not tainted (2.6.27-rc5) [ 78.815743] MSR: 00001032 <ME,IR,DR> CR: 22242482 XER: 20000000 [ 78.821550] DAR: 0000004c, DSISR: 40000000 [ 78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000 [ 78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000 [ 78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960 [ 78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30 [ 78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064 [ 78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94 [ 78.862581] LR [c0334a28] dst_alloc+0x40/0xc4 [ 78.868451] Call Trace: [ 78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable) [ 78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4 [ 78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc [ 78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88 [ 78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78 [ 78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4 [ 78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0 [ 78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210 [ 78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38 [ 78.927295] --- Exception: c01 at 0xfe2d730 [ 78.927297] LR = 0xfe2d71c [ 78.939019] Instruction dump: [ 78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378 [ 78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050 [ 78.956464] ---[ end trace 05fa1ed7972487a1 ]--- As commented by Benjamin Thery, the bug was introduced by f2fc6a54585a1be6669613a31fbaba2ecbadcd36, while adding network namespaces support to ipv6 routes. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01ipv6: NULL pointer dereferrence in tcp_v6_send_ackDenis V. Lunev
The following actions are possible: tcp_v6_rcv skb->dev = NULL; tcp_v6_do_rcv tcp_v6_hnd_req tcp_check_req req->rsk_ops->send_ack == tcp_v6_send_ack So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace from dst entry. Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding in IPv4 code. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01tcp: Fix NULL dereference in tcp_4_send_ack()Vitaliy Gusev
Fix NULL dereference in tcp_4_send_ack(). As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs: BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0 IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 Stack: ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d 0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8 0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020 Call Trace: <IRQ> [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22 [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272 [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701 [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2 [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5 [<ffffffff80462faa>] netif_receive_skb+0x293/0x303 [<ffffffff80465a9b>] process_backlog+0x80/0xd0 [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4 [<ffffffff8046560e>] net_rx_action+0xb9/0x17f [<ffffffff80234cc5>] __do_softirq+0xa3/0x164 [<ffffffff8020c52c>] call_softirq+0x1c/0x28 <EOI> [<ffffffff8020de1c>] do_softirq+0x34/0x72 [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50 [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14 [<ffffffff804599cd>] release_sock+0xb8/0xc1 [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8045751f>] sys_connect+0x68/0x8e [<ffffffff80291818>] ? fd_install+0x5f/0x68 [<ffffffff80457784>] ? sock_map_fd+0x55/0x62 [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80 Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48 RIP [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 RSP <ffffffff80762b78> CR2: 00000000000004d0 Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30sctp: Fix kernel panic while process protocol violation parameterWei Yongjun
Since call to function sctp_sf_abort_violation() need paramter 'arg' with 'struct sctp_chunk' type, it will read the chunk type and chunk length from the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen() always with 'struct sctp_paramhdr' type's parameter, it will be passed to sctp_sf_abort_violation(). This may cause kernel panic. sctp_sf_violation_paramlen() |-- sctp_sf_abort_violation() |-- sctp_make_abort_violation() This patch fixed this problem. This patch also fix two place which called sctp_sf_violation_paramlen() with wrong paramter type. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30iucv: Fix mismerge again.Heiko Carstens
fb65a7c091529bfffb1262515252c0d0f6241c5c ("iucv: Fix bad merging.") fixed a merge error, but in a wrong way. We now end up with the bug below. This patch corrects the mismerge like it was intended. BUG: scheduling while atomic: swapper/1/0x00000000 Modules linked in: CPU: 1 Not tainted 2.6.27-rc7-00094-gc0f4d6d #9 Process swapper (pid: 1, task: 000000003fe7d988, ksp: 000000003fe838c0) 0000000000000000 000000003fe839b8 0000000000000002 0000000000000000 000000003fe83a58 000000003fe839d0 000000003fe839d0 0000000000390de6 000000000058acd8 00000000000000d0 000000003fe7dcd8 0000000000000000 000000000000000c 000000000000000d 0000000000000000 000000003fe83a28 000000000039c5b8 0000000000015e5e 000000003fe839b8 000000003fe83a00 Call Trace: ([<0000000000015d6a>] show_trace+0xe6/0x134) [<0000000000039656>] __schedule_bug+0xa2/0xa8 [<0000000000391744>] schedule+0x49c/0x910 [<0000000000391f64>] schedule_timeout+0xc4/0x114 [<00000000003910d4>] wait_for_common+0xe8/0x1b4 [<00000000000549ae>] call_usermodehelper_exec+0xa6/0xec [<00000000001af7b8>] kobject_uevent_env+0x418/0x438 [<00000000001d08fc>] bus_add_driver+0x1e4/0x298 [<00000000001d1ee4>] driver_register+0x90/0x18c [<0000000000566848>] netiucv_init+0x168/0x2c8 [<00000000000120be>] do_one_initcall+0x3e/0x17c [<000000000054a31a>] kernel_init+0x1ce/0x248 [<000000000001a97a>] kernel_thread_starter+0x6/0xc [<000000000001a974>] kernel_thread_starter+0x0/0xc iucv: NETIUCV driver initialized initcall netiucv_init+0x0/0x2c8 returned with preemption imbalance Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30ipsec: Fix pskb_expand_head corruption in xfrm_state_check_spaceHerbert Xu
We're never supposed to shrink the headroom or tailroom. In fact, shrinking the headroom is a fatal action. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion ath9k: disable MIB interrupts to fix interrupt storm [Bluetooth] Fix USB disconnect handling of btusb driver [Bluetooth] Fix wrong URB handling of btusb driver [Bluetooth] Fix I/O errors on MacBooks with Broadcom chips
2008-09-24netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertionYasuyuki Kozakai
The current code ignores rules for internal options in HBH/DST options header in packet processing if 'Not strict' mode is specified (which is not implemented). Clearly it is not expected by user. Kernel should reject HBH/DST rule insertion with 'Not strict' mode in the first place. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix put_data error handling 9p: use an IS_ERR test rather than a NULL test 9p: introduce missing kfree 9p-trans_fd: fix and clean up module init/exit paths 9p-trans_fd: don't do fs segment mangling in p9_fd_poll() 9p-trans_fd: clean up p9_conn_create() 9p-trans_fd: fix trans_fd::p9_conn_destroy() 9p: implement proper trans module refcounting and unregistration
2008-09-249p: fix put_data error handlingEric Van Hensbergen
Abhishek Kulkarni pointed out an inconsistency in the way errors are returned from p9_put_data. On deeper exploration it seems the error handling for this path was completely wrong. This patch adds checks for allocation problems and propagates errors correctly. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-249p: introduce missing kfreeJulia Lawall
Error handling code following a kmalloc should free the allocated data. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,l; position p1,p2; expression *ptr != NULL; @@ ( if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S | x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...); ... if (x == NULL) S ) <... when != x when != if (...) { <+...x...+> } x->f = E ...> ( return \(0\|<+...x...+>\|ptr\); | return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-09-249p-trans_fd: fix and clean up module init/exit pathsTejun Heo
trans_fd leaked p9_mux_wq on module unload. Fix it. While at it, collapse p9_mux_global_init() into p9_trans_fd_init(). It's easier to follow this way and the global poll_tasks array is about to removed anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-249p-trans_fd: don't do fs segment mangling in p9_fd_poll()Tejun Heo
p9_fd_poll() is never called with user pointers and f_op->poll() doesn't expect its arguments to be from userland. There's no need to set kernel ds before calling f_op->poll() from p9_fd_poll(). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-249p-trans_fd: clean up p9_conn_create()Tejun Heo
* Use kzalloc() to allocate p9_conn and remove 0/NULL initializations. * Clean up error return paths. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-249p-trans_fd: fix trans_fd::p9_conn_destroy()Tejun Heo
p9_conn_destroy() first kills all current requests by calling p9_conn_cancel(), then waits for the request list to be cleared by waiting on p9_conn->equeue. After that, polling is stopped and the trans is destroyed. This sequence has a few problems. * Read and write works were never cancelled and the p9_conn can be destroyed while the works are running as r/w works remove requests from the list and dereference the p9_conn from them. * The list emptiness wait using p9_conn->equeue wouldn't trigger because p9_conn_cancel() always clears all the lists and the only way the wait can be triggered is to have another task to issue a request between the slim window between p9_conn_cancel() and the wait, which isn't safe under the current implementation with or without the wait. This patch fixes the problem by first stopping poll, which can schedule r/w works, first and cancle r/w works which guarantees that r/w works are not and will not run from that point and then calling p9_conn_cancel() and do the rest of destruction. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-249p: implement proper trans module refcounting and unregistrationTejun Heo
9p trans modules aren't refcounted nor were they unregistered properly. Fix it. * Add 9p_trans_module->owner and reference the module on each trans instance creation and put it on destruction. * Protect v9fs_trans_list with a spinlock. This isn't strictly necessary as the list is manipulated only during module loading / unloading but it's a good idea to make the API safe. * Unregister trans modules when the corresponding module is being unloaded. * While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-23sys_paccept: disable paccept() until API design is resolvedMichael Kerrisk
The reasons for disabling paccept() are as follows: * The API is more complex than needed. There is AFAICS no demonstrated use case that the sigset argument of this syscall serves that couldn't equally be served by the use of pselect/ppoll/epoll_pwait + traditional accept(). Roland seems to concur with this opinion (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255). I have (more than once) asked Ulrich to explain otherwise (http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he does not respond, so one is left to assume that he doesn't know of such a case. * The use of a sigset argument is not consistent with other I/O APIs that can block on a single file descriptor (e.g., read(), recv(), connect()). * The behavior of paccept() when interrupted by a signal is IMO strange: the kernel restarts the system call if SA_RESTART was set for the handler. I think that it should not do this -- that it should behave consistently with paccept()/ppoll()/epoll_pwait(), which never restart, regardless of SA_RESTART. The reasoning here is that the very purpose of paccept() is to wait for a connection or a signal, and that restarting in the latter case is probably never useful. (Note: Roland disagrees on this point, believing that rather paccept() should be consistent with accept() in its behavior wrt EINTR (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).) I believe that instead, a simpler API, consistent with Ulrich's other recent additions, is preferable: accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags); (This simpler API was originally proposed by Ulrich: http://thread.gmane.org/gmane.linux.network/92072) If this simpler API is added, then if we later decide that the sigset argument really is required, then a suitable bit in 'flags' could be added to indicate the presence of the sigset argument. At this point, I am hoping we either will get a counter-argument from Ulrich about why we really do need paccept()'s sigset argument, or that he will resubmit the original accept4() patch. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netdev: simple_tx_hash shouldn't hash inside fragments
2008-09-20netdev: simple_tx_hash shouldn't hash inside fragmentsAlexander Duyck
Currently simple_tx_hash is hashing inside of udp fragments. As a result packets are getting getting sent to all queues when they shouldn't be. This causes a serious performance regression which can be seen by sending UDP frames larger than mtu on multiqueue devices. This change will make it so that fragments are hashed only as IP datagrams w/o any protocol information. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: e100: Use pci_pme_active to clear PME_Status and disable PME# e1000: prevent corruption of EEPROM/NVM forcedeth: call restore mac addr in nv_shutdown path bnx2: Promote vector field in bnx2_irq structure from u16 to unsigned int sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH sctp: do not enable peer features if we can't do them. sctp: set the skb->ip_summed correctly when sending over loopback. udp: Fix rcv socket locking
2008-09-18sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTHVlad Yasevich
If INIT-ACK is received with SupportedExtensions parameter which indicates that the peer does not support AUTH, the packet will be silently ignore, and sctp_process_init() do cleanup all of the transports in the association. When T1-Init timer is expires, OOPS happen while we try to choose a different init transport. The solution is to only clean up the non-active transports, i.e the ones that the peer added. However, that introduces a problem with sctp_connectx(), because we don't mark the proper state for the transports provided by the user. So, we'll simply mark user-provided transports as ACTIVE. That will allow INIT retransmissions to work properly in the sctp_connectx() context and prevent the crash. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-18sctp: do not enable peer features if we can't do them.Vlad Yasevich
Do not enable peer features like addip and auth, if they are administratively disabled localy. If the peer resports that he supports something that we don't, neither end can use it so enabling it is pointless. This solves a problem when talking to a peer that has auth and addip enabled while we do not. Found by Andrei Pelinescu-Onciul <andrei@iptel.org>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-18sctp: set the skb->ip_summed correctly when sending over loopback.Vlad Yasevich
Loopback used to clobber the ip_summed filed which sctp then used to figure out if it needed to do checksumming or not. Now that loopback doesn't do that any more, sctp needs to set the ip_summed field correctly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-16warn: Turn the netdev timeout WARN_ON() into a WARN()Arjan van de Ven
this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(), so that the device and driver names are inside the warning message. This helps automated tools like kerneloops.org to collect the data and do statistics, as well as making it more likely that humans cut-n-paste the important message as part of a bugreport. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-15udp: Fix rcv socket lockingHerbert Xu
The previous patch in response to the recursive locking on IPsec reception is broken as it tries to drop the BH socket lock while in user context. This patch fixes it by shrinking the section protected by the socket lock to sock_queue_rcv_skb only. The only reason we added the lock is for the accounting which happens in that function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-12[Bluetooth] Fix regression from using default link policyMarcel Holtmann
To speed up the Simple Pairing connection setup, the support for the default link policy has been enabled. This is in contrast to settings the link policy on every connection setup. Using the default link policy is the preferred way since there is no need to dynamically change it for every connection. For backward compatibility reason and to support old userspace the HCISETLINKPOL ioctl has been switched over to using hci_request() to issue the HCI command for setting the default link policy instead of just storing it in the HCI device structure. However the hci_request() can only be issued when the device is brought up. If used on a device that is registered, but still down it will timeout and fail. This is problematic since the command is put on the TX queue and the Bluetooth core tries to submit it to hardware that is not ready yet. The timeout for these requests is 10 seconds and this causes a significant regression when setting up a new device. The userspace can perfectly handle a failure of the HCISETLINKPOL ioctl and will re-submit it later, but the 10 seconds delay causes a problem. So in case hci_request() is called on a device that is still down, just fail it with ENETDOWN to indicate what happens. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09ipv6: Fix OOPS in ip6_dst_lookup_tail().Neil Horman
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash" dst->neighbour is not necessarily hooked up at this point in the processing path, so blindly dereferencing it is the wrong thing to do. This NULL check exists in other similar paths and this case was just an oversight. Also fix the completely wrong and confusing indentation here while we're at it. Based upon a patch by Evgeniy Polyakov. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09ipsec: Restore larval states and socket policies in dumpHerbert Xu
The commit commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3 ("[XFRM]: Speed up xfrm_policy and xfrm_state walking") inadvertently removed larval states and socket policies from netlink dumps. This patch restores them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
2008-09-09[Bluetooth] Reject L2CAP connections on an insecure ACL linkMarcel Holtmann
The Security Mode 4 of the Bluetooth 2.1 specification has strict authentication and encryption requirements. It is the initiators job to create a secure ACL link. However in case of malicious devices, the acceptor has to make sure that the ACL is encrypted before allowing any kind of L2CAP connection. The only exception here is the PSM 1 for the service discovery protocol, because that is allowed to run on an insecure ACL link. Previously it was enough to reject a L2CAP connection during the connection setup phase, but with Bluetooth 2.1 it is forbidden to do any L2CAP protocol exchange on an insecure link (except SDP). The new hci_conn_check_link_mode() function can be used to check the integrity of an ACL link. This functions also takes care of the cases where Security Mode 4 is disabled or one of the devices is based on an older specification. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09[Bluetooth] Enforce correct authentication requirementsMarcel Holtmann
With the introduction of Security Mode 4 and Simple Pairing from the Bluetooth 2.1 specification it became mandatory that the initiator requires authentication and encryption before any L2CAP channel can be established. The only exception here is PSM 1 for the service discovery protocol (SDP). It is meant to be used without any encryption since it contains only public information. This is how Bluetooth 2.0 and before handle connections on PSM 1. For Bluetooth 2.1 devices the pairing procedure differentiates between no bonding, general bonding and dedicated bonding. The L2CAP layer wrongly uses always general bonding when creating new connections, but it should not do this for SDP connections. In this case the authentication requirement should be no bonding and the just-works model should be used, but in case of non-SDP connection it is required to use general bonding. If the new connection requires man-in-the-middle (MITM) protection, it also first wrongly creates an unauthenticated link key and then later on requests an upgrade to an authenticated link key to provide full MITM protection. With Simple Pairing the link key generation is an expensive operation (compared to Bluetooth 2.0 and before) and doing this twice during a connection setup causes a noticeable delay when establishing a new connection. This should be avoided to not regress from the expected Bluetooth 2.0 connection times. The authentication requirements are known up-front and so enforce them. To fulfill these requirements the hci_connect() function has been extended with an authentication requirement parameter that will be stored inside the connection information and can be retrieved by userspace at any time. This allows the correct IO capabilities exchange and results in the expected behavior. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09[Bluetooth] Fix reference counting during ACL config stageMarcel Holtmann
The ACL config stage keeps holding a reference count on incoming connections when requesting the extended features. This results in keeping an ACL link up without any users. The problem here is that the Bluetooth specification doesn't define an ownership of the ACL link and thus it can happen that the implementation on the initiator side doesn't care about disconnecting unused links. In this case the acceptor needs to take care of this. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bridge: don't allow setting hello time to zero netns : fix kernel panic in timewait socket destruction pkt_sched: Fix qdisc state in net_tx_action() netfilter: nf_conntrack_irc: make sure string is terminated before calling simple_strtoul netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixlet netfilter: nf_conntrack_gre: more locking around keymap list netfilter: nf_conntrack_sip: de-static helper pointers
2008-09-08bridge: don't allow setting hello time to zeroStephen Hemminger
Dushan Tcholich reports that on his system ksoftirqd can consume between %6 to %10 of cpu time, and cause ~200 context switches per second. He then correlated this with a report by bdupree@techfinesse.com: http://marc.info/?l=linux-kernel&m=119613299024398&w=2 and the culprit cause seems to be starting the bridge interface. In particular, when starting the bridge interface, his scripts are specifying a hello timer interval of "0". The bridge hello time can't be safely set to values less than 1 second, otherwise it is possible to end up with a runaway timer. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-08netns : fix kernel panic in timewait socket destructionDaniel Lezcano
How to reproduce ? - create a network namespace - use tcp protocol and get timewait socket - exit the network namespace - after a moment (when the timewait socket is destroyed), the kernel panics. # BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 PGD 119985067 PUD 11c5c0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3 RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30 RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00 RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200 R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8800bff9e000, task ffff88011ff76690) Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a 0000000000000008 0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7 ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108 Call Trace: <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193 [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28 [<ffffffff8200e611>] ? do_softirq+0x2c/0x68 [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9 [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7 65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0 48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00 RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP <ffff88011ff7fed0> CR2: 0000000000000007 This patch provides a function to purge all timewait sockets related to a network namespace. The timewait sockets life cycle is not tied with the network namespace, that means the timewait sockets stay alive while the network namespace dies. The timewait sockets are for avoiding to receive a duplicate packet from the network, if the network namespace is freed, the network stack is removed, so no chance to receive any packets from the outside world. Furthermore, having a pending destruction timer on these sockets with a network namespace freed is not safe and will lead to an oops if the timer callback which try to access data belonging to the namespace like for example in: inet_twdr_do_twkill_work -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED); Purging the timewait sockets at the network namespace destruction will: 1) speed up memory freeing for the namespace 2) fix kernel panic on asynchronous timewait destruction Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07pkt_sched: Fix qdisc state in net_tx_action()Jarek Poplawski
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc is neither ran nor rescheduled, which may cause endless loop in dev_deactivate(). Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07netfilter: nf_conntrack_irc: make sure string is terminated before calling ↵Patrick McHardy
simple_strtoul Alexey Dobriyan points out: 1. simple_strtoul() silently accepts all characters for given base even if result won't fit into unsigned long. This is amazing stupidity in itself, but 2. nf_conntrack_irc helper use simple_strtoul() for DCC request parsing. Data first copied into 64KB buffer, so theoretically nothing prevents reading past the end of it, since data comes from network given 1). This is not actually a problem currently since we're guaranteed to have a 0 byte in skb_shared_info or in the buffer the data is copied to, but to make this more robust, make sure the string is actually terminated. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixletAlexey Dobriyan
It does "kfree(list_head)" which looks wrong because entity that was allocated is definitely not list_head. However, this all works because list_head is first item in struct nf_ct_gre_keymap. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07netfilter: nf_conntrack_gre: more locking around keymap listAlexey Dobriyan
gre_keymap_list should be protected in all places. (unless I'm misreading something) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07netfilter: nf_conntrack_sip: de-static helper pointersAlexey Dobriyan
Helper's ->help hook can run concurrently with itself, so iterating over SIP helpers with static pointer won't work reliably. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-05Revert "mac80211: Use IWEVASSOCREQIE instead of IWEVCUSTOM"Linus Torvalds
This reverts commit 087d833e5a9f67ba933cb32eaf5a2279c1a5b47c, which was reported to break wireless at least in some combinations with 32bit user space and a 64bit kernel. Alex Williamnson bisected it to this commit. Reported-and-bisected-by: Alex Williamson <alex.williamson@hp.com> Acked-by: John W. Linville <linville@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Cc: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bnx2x: Accessing un-mapped page ath9k: Fix TX control flag use for no ACK and RTS/CTS ath9k: Fix TX status reporting iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove iwlwifi: call apm stop on exit iwlwifi: fix Tx cmd memory allocation failure handling iwlwifi: fix rx_chain computation iwlwifi: fix station mimo power save values iwlwifi: remove false rxon if rx chain changes iwlwifi: fix hidden ssid discovery in passive channels iwlwifi: W/A for the TSF correction in IBSS netxen: Remove workaround for chipset quirk pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info ixgbe: initialize interrupt throttle rate net/usb/pegasus: avoid hundreds of diagnostics tipc: Don't use structure names which easily globally conflict.