summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2010-10-04netfilter: remove duplicated includeNicolas Kaiser
Remove duplicated include. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-04Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/Kconfig net/ipv4/tcp_timer.c
2010-10-04netfilter: ipt_LOG: add bufferisation to call printk() onceEric Dumazet
ipt_LOG & ip6t_LOG use lot of calls to printk() and use a lock in a hope several cpus wont mix their output in syslog. printk() being very expensive [1], its better to call it once, on a prebuilt and complete line. Also, with mixed IPv4 and IPv6 trafic, separate IPv4/IPv6 locks dont avoid garbage. I used an allocation of a 1024 bytes structure, sort of seq_printf() but with a fixed size limit. Use a static buffer if dynamic allocation failed. Emit a once time alert if buffer size happens to be too short. [1]: printk() has various features like printk_delay()... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-04netfilter: nf_nat: make find/put staticStephen Hemminger
The functions nf_nat_proto_find_get and nf_nat_proto_put are only used internally in nf_nat_core. This might break some out of tree NAT module. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: vlan: dont drop packets from unknown vlans in promiscuous mode Phonet: Correct header retrieval after pskb_may_pull um: Proper Fix for f25c80a4: remove duplicate structure field initialization ip_gre: Fix dependencies wrt. ipv6. net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out() iwl3945: queue the right work if the scan needs to be aborted mac80211: fix use-after-free
2010-10-04IPVS: sip persistence engineSimon Horman
Add the SIP callid as a key for persistence. This allows multiple connections from the same IP address to be differentiated on the basis of the callid. When used in conjunction with the persistence mask, it allows connections from different IP addresses to be aggregated on the basis of the callid. It is envisaged that a persistence mask of 0.0.0.0 will be a useful setting. That is, ignore the source IP address when checking for persistence. It is envisaged that this option will be used in conjunction with one-packet scheduling. This only works with UDP and cannot be made to work with TCP within the current framework. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Fallback if persistence engine failsSimon Horman
Fall back to normal persistence handling if the persistence engine fails to recognise a packet. This way, at least the packet will go somewhere. It is envisaged that iptables could be used to block packets such if this is not desired although nf_conntrack_sip would likely need to be enhanced first. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Allow configuration of persistence enginesSimon Horman
Allow the persistence engine of a virtual service to be set, edited and unset. This feature only works with the netlink user-space interface. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: management of persistence engine modulesSimon Horman
This is based heavily on the scheduler management code Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Add persistence engine data to /proc/net/ip_vs_connSimon Horman
This shouldn't break compatibility with userspace as the new data is at the end of the line. I have confirmed that this doesn't break ipvsadm, the main (only?) user-space user of this data. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Add struct ip_vs_peSimon Horman
Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: ip_vs_{un,}bind_scheduler NULL argumentsSimon Horman
In general NULL arguments aren't passed by the few callers that exist, so don't test for them. The exception is to make passing NULL to ip_vs_unbind_scheduler() a noop. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Allow null argument to ip_vs_scheduler_put()Simon Horman
This simplifies caller logic sightly. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: Add struct ip_vs_conn_paramSimon Horman
Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04IPVS: compact ip_vs_sched_persist()Simon Horman
Compact ip_vs_sched_persist() by setting up parameters and calling functions once. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04netfilter: nf_conntrack_sip: Add callid parserSimon Horman
Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04netfilter: nf_conntrack_sip: Allow ct_sip_get_header() to be called with a ↵Simon Horman
null ct argument Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-03net: introduce DST_NOCACHE flagEric Dumazet
While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03sctp: Fix break indentation in sctp_ioctl().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-10-03sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()Dan Rosenberg
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03sctp: prevent reading out-of-bounds memoryDan Rosenberg
Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03ipv4: correct IGMP behavior on v3 query during v2-compatibility modeDavid Stevens
A recent patch to allow IGMPv2 responses to IGMPv3 queries bypasses length checks for valid query lengths, incorrectly resets the v2_seen timer, and does not support IGMPv1. The following patch responds with a v2 report as required by IGMPv2 while correcting the other problems introduced by the patch. Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03ipmr: cleanupsEric Dumazet
Various code style cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03ipmr: RCU protection for mfc_cache_arrayEric Dumazet
Use RCU & RTNL protection for mfc_cache_array[] ipmr_cache_find() is called under rcu_read_lock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03ipmr: RCU conversion of mroute_skEric Dumazet
Use RCU and RTNL to protect (struct mr_table)->mroute_sk Readers use RCU, writers use RTNL. ip_ra_control() already use an RCU grace period before ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in mrtsock_destruct() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03ipmr: __pim_rcv() is called under rcu_read_lockEric Dumazet
No need to get a reference on reg_dev and release it, we are in a rcu_read_lock() protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03gre: protocol table can be staticstephen hemminger
This table is only used in gre.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03Revert "ipv4: Make INET_LRO a bool instead of tristate."Ben Hutchings
This reverts commit e81963b180ac502fda0326edf059b1e29cdef1a2. LRO is now deprecated in favour of GRO, and only a few drivers use it, so it is desirable to build it as a module in distribution kernels. The original change to prevent building it as a module was made in an attempt to avoid the case where some dependents are set to y and some to m, and INET_LRO can be set to m rather than y. However, the Kconfig system will reliably set INET_LRO=y in this case. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03net: Fix the condition passed to sk_wait_event()Nagendra Tomar
This patch fixes the condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. >>> snip <<< localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429] localhost kernel: CPU 3: localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200 localhost kernel: localhost kernel: Call Trace: localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0 localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140 localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170 localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190 localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90 localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83 >>> snip <<< What is happening is, that the sk_wait_event() condition passed from sk_stream_wait_memory() evaluates to true for the case of tcp global memory exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping. This causes the caller to again try allocation, which again fails and again calls sk_stream_wait_memory(), and so on. [ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088 ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ] Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03net: Fix IPv6 PMTU disc. w/ asymmetric routesMaciej Żenczykowski
Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-01nfsd: provide callbacks on svc_xprt deletionJ. Bruce Fields
NFSv4.1 needs warning when a client tcp connection goes down, if that connection is being used as a backchannel, so that it can warn the client that it has lost the backchannel connection. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-10-01nfsd4: remove spkm3J. Bruce Fields
Unfortunately, spkm3 never got very far; while interoperability with one other implementation was demonstrated at some point, problems were found with the spec that were deemed not worth fixing. The kernel code is useless on its own without nfs-utils patches which were never merged into nfs-utils, and were only ever available from citi.umich.edu. They appear not to have been updated since 2005. Therefore it seems safe to assume that this code has no users, and never will. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: fix race in new cache_wait code.NeilBrown
If we set up to wait for a cache item to be filled in, and then find that it is no longer pending, it could be that some other thread is in 'cache_revisit_request' and has moved our request to its 'pending' list. So when our setup_deferral calls cache_revisit_request it will find nothing to put on the pending list, and do nothing. We then return from cache_wait_req, thus leaving the 'sleeper' on-stack structure open to being corrupted by subsequent stack usage. However that 'sleeper' could still be on the 'pending' list that the other thread is looking at and so any corruption could cause it to behave badly. To avoid this race we simply take the same path as if the 'wait_for_completion_interruptible_timeout' was interrupted and if the sleeper is no longer on the list (which it won't be) we wait on the completion - which will ensure that any other cache_revisit_request will have let go of the sleeper. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Create sockets in net namespacesPavel Emelyanov
The context is already known in all the sock_create callers. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01net: Export __sock_createPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Tag rpc_xprt with netPavel Emelyanov
The net is known from the xprt_create and this tagging will also give un the context in the conntection workers where real sockets are created. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Add net to xprt_createPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Add net to rpc_create_argsPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Pull net argument downto svc_create_socketPavel Emelyanov
After this the socket creation in it knows the context. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Add net argument to svc_create_xprtPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Factor out rpc_xprt freeingPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Factor out rpc_xprt allocationPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2010-09-30ipv4: rcu conversion in ip_route_output_slowEric Dumazet
ip_route_output_slow() is enclosed in an rcu_read_lock() protected section, so that no references are taken/released on device, thanks to __ip_dev_find() & dev_get_by_index_rcu() Tested with ip route cache disabled, and a stress test : Before patch: elapsed time : real 1m38.347s user 0m11.909s sys 23m51.501s Profile: 13788.00 22.7% ip_route_output_slow [kernel] 7875.00 13.0% dst_destroy [kernel] 3925.00 6.5% fib_semantic_match [kernel] 3144.00 5.2% fib_rules_lookup [kernel] 3061.00 5.0% dst_alloc [kernel] 2276.00 3.7% rt_set_nexthop [kernel] 1762.00 2.9% fib_table_lookup [kernel] 1538.00 2.5% _raw_read_lock [kernel] 1358.00 2.2% ip_output [kernel] After patch: real 1m28.808s user 0m13.245s sys 20m37.293s 10950.00 17.2% ip_route_output_slow [kernel] 10726.00 16.9% dst_destroy [kernel] 5170.00 8.1% fib_semantic_match [kernel] 3937.00 6.2% dst_alloc [kernel] 3635.00 5.7% rt_set_nexthop [kernel] 2900.00 4.6% fib_rules_lookup [kernel] 2240.00 3.5% fib_table_lookup [kernel] 1427.00 2.2% _raw_read_lock [kernel] 1157.00 1.8% kmem_cache_alloc [kernel] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-30ipv4: introduce __ip_dev_find()Eric Dumazet
ip_dev_find(net, addr) finds a device given an IPv4 source address and takes a reference on it. Introduce __ip_dev_find(), taking a third argument, to optionally take the device reference. Callers not asking the reference to be taken should be in an rcu_read_lock() protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-30vlan: dont drop packets from unknown vlans in promiscuous modeEric Dumazet
Roger Luethi noticed packets for unknown VLANs getting silently dropped even in promiscuous mode. Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common() before drops. As suggested by Patrick, mark such packets to have skb->pkt_type set to PACKET_OTHERHOST to make sure they are dropped by IP stack. Reported-by: Roger Luethi <rl@hellgate.ch> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-30ipv4: __mkroute_output() speedupEric Dumazet
While doing stress tests with a disabled IP route cache, I found __mkroute_output() was touching three times in_device atomic refcount. Use RCU to touch it once to reduce cache line ping pongs. Before patch time to perform the test real 1m42.009s user 0m12.545s sys 25m0.726s Profile : 16109.00 26.4% ip_route_output_slow vmlinux 7434.00 12.2% dst_destroy vmlinux 3280.00 5.4% fib_rules_lookup vmlinux 3252.00 5.3% fib_semantic_match vmlinux 2622.00 4.3% fib_table_lookup vmlinux 2535.00 4.1% dst_alloc vmlinux 1750.00 2.9% _raw_read_lock vmlinux 1532.00 2.5% rt_set_nexthop vmlinux After patch real 1m36.503s user 0m12.977s sys 23m25.608s 14234.00 22.4% ip_route_output_slow vmlinux 8717.00 13.7% dst_destroy vmlinux 4052.00 6.4% fib_rules_lookup vmlinux 3951.00 6.2% fib_semantic_match vmlinux 3191.00 5.0% dst_alloc vmlinux 1764.00 2.8% fib_table_lookup vmlinux 1692.00 2.7% _raw_read_lock vmlinux 1605.00 2.5% rt_set_nexthop vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-30Phonet: restore flow control credits when sending failsRémi Denis-Courmont
This patch restores the below flow control patch submitted by Rémi Denis-Courmont, which accidentaly got lost due to Pipe controller patch on Phonet. commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Date: Mon Aug 30 12:57:03 2010 +0000 Phonet: restore flow control credits when sending fails Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6