summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2009-11-11skbuff: Do not allow skb recycling with disabled IRQsAnton Vorontsov
NAPI drivers try to recycle SKBs in their polling routine, but we generally don't know the context in which the polling will be called, and the skb recycling itself may require IRQs to be enabled. This patch adds irqs_disabled() test to the skb_recycle_check() routine, so that we'll not let the drivers hit the skb recycling path with IRQs disabled. As a side effect, this patch actually disables skb recycling for some [broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during TX ring processing, and then tries to recycle an skb, and that caused the following badness: nf_conntrack version 0.5.0 (1008 buckets, 4032 max) ------------[ cut here ]------------ Badness at kernel/softirq.c:143 NIP: c003e3c4 LR: c423a528 CTR: c003e344 ... NIP [c003e3c4] local_bh_enable+0x80/0xc4 LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] Call Trace: [c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable) [c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] [c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70 Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11ipv6: Remove unused var in inet6_dump_ifinfo()David S. Miller
Reported by Stephen Rothwell: -------------------- Today's linux-next build (x86_64 allmodconfig) produced this warning: net/ipv6/addrconf.c: In function 'inet6_dump_ifinfo': net/ipv6/addrconf.c:3833: warning: unused variable 'err' Introduced by commit 84d2697d9649339215675551eae28ba04068dea1 ("ipv6: speedup inet6_dump_ifinfo()"). -------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10CAN: use dev_get_by_index_rcustephen hemminger
Use new function to avoid doing read_lock(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10IPV4: use rcu to walk list of devices in IGMPstephen hemminger
This also needs to be optimized for large number of devices. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10decnet: use RCU to find network devicesstephen hemminger
When showing device statistics use RCU rather than read_lock(&dev_base_lock) Compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10net: use rcu for network scheduler APIstephen hemminger
Use RCU to walk list of network devices in qdisc dump. This could be optimized for large number of devices. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10vlan: eliminate use of dev_base_lockstephen hemminger
Do not need to use read_lock(&dev_base_lock), use RCU instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10IPv6: use ipv6_addr_v4mapped()Brian Haley
Change udp6_portaddr_hash() to use ipv6_addr_v4mapped() inline instead of ipv6_addr_type(). Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10sit: Clean up DF code by copying from IPIPHerbert Xu
This patch rearranges the SIT DF bit handling using the new IPIP DF code. The only externally visible effect should be the case where PMTU is enabled and the MTU is exactly 1280 bytes. In this case the previous code would send packets out with DF off while the new code would set the DF bit. This is inline with RFC 4213. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10ipv6: Allow inet6_dump_addr() to handle more than 64 addressesEric Dumazet
Apparently, inet6_dump_addr() is not able to handle more than 64 ipv6 addresses per device. We must break from inner loops in case skb is full, or else cursor is put at the end of list. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10ipv6: speedup inet6_dump_ifinfo()Eric Dumazet
When handling large number of netdevice, inet6_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guardCyrill Gorcunov
Use guard DECLARE_SOCKADDR in a few more places which allow us to catch if the structure copied back is too big. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10udp: bind() optimisationEric Dumazet
UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10Phonet: allocate and copy for pipe TX without sock lockRémi Denis-Courmont
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10Phonet: put sockets in a hash tableRémi Denis-Courmont
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-11-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/usb/ems_usb.c
2009-11-08xfrm: SAD entries do not expire correctly after suspend-resumeYury Polyanskiy
This fixes the following bug in the current implementation of net/xfrm: SAD entries timeouts do not count the time spent by the machine in the suspended state. This leads to the connectivity problems because after resuming local machine thinks that the SAD entry is still valid, while it has already been expired on the remote server. The cause of this is very simple: the timeouts in the net/xfrm are bound to the old mod_timer() timers. This patch reassigns them to the CLOCK_REALTIME hrtimer. I have been using this version of the patch for a few months on my machines without any problems. Also run a few stress tests w/o any issues. This version of the patch uses tasklet_hrtimer by Peter Zijlstra (commit 9ba5f0). This patch is against 2.6.31.4. Please CC me. Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08net/compat_ioctl: support SIOCWANDEVArnd Bergmann
This adds compat_ioctl support for SIOCWANDEV, which has always been missing. The definition of struct compat_ifreq was missing an ifru_settings fields that is needed to support SIOCWANDEV, so add that and clean up the whitespace damage in the struct definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08net, compat_ioctl: fix SIOCGMII ioctlsArnd Bergmann
SIOCGMIIPHY and SIOCGMIIREG return data through ifreq, so it needs to be converted on the way out as well. SIOCGIFPFLAGS is unused, but has the same problem in theory. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: multicast RX should increment SNMP/sk_drops counter in allocation failuresEric Dumazet
When skb_clone() fails, we should increment sk_drops and SNMP counters. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08ipv6: udp: Optimise multicast receptionEric Dumazet
IPV6 UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08ipv4: udp: Optimise multicast receptionEric Dumazet
UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. It's also a base for a future RCU conversion of multicast recption. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08ipv6: udp: optimize unicast RX pathEric Dumazet
We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, in6addr_any) chain to find sockets not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08ipv4: udp: optimize unicast RX pathEric Dumazet
We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain we added in previous patch. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, INADDR_ANY) chain to find socket not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: secondary hash on (local port, local address)Eric Dumazet
Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: split sk_hash into two u16 hashesEric Dumazet
Union sk_hash with two u16 hashes for udp (no extra memory taken) One 16 bits hash on (local port) value (the previous udp 'hash') One 16 bits hash on (local address, local port) values, initialized but not yet used. This second hash is using jenkin hash for better distribution. Because the 'port' is xored later, a partial hash is performed on local address + net_hash_mix(net) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: add a counter into udp_hslotEric Dumazet
Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08net/appletalk: using compat_ptr needs inclusion of linux/compat.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08net: Support specifying the network namespace upon device creation.Eric W. Biederman
There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-11-08appletalk/ddp.c: Neaten checksum functionJoe Perches
atalk_sum_partial can now use the rol16 function in bitops.h Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08ipv6: avoid dev_hold()/dev_put() in rawv6_bind()Eric Dumazet
Using RCU helps not touching device refcount in rawv6_bind() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08can: should not use __dev_get_by_index() without locksEric Dumazet
bcm_proc_getifname() is called with RTNL and dev_base_lock not held. It calls __dev_get_by_index() without locks, and this is illegal (might crash) Close the race by holding dev_base_lock and copying dev->name in the protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07rtnetlink: CleanupsEric Dumazet
Pure cleanups patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07net/x25: push BKL usage into x25_protoArnd Bergmann
The x25 driver uses lock_kernel() implicitly through its proto_ops wrapper. The makes the usage explicit in order to get rid of that wrapper and to better document the usage of the BKL. The next step should be to get rid of the usage of the BKL in x25 entirely, which requires understanding what data structures need serialized accesses. Cc: Henner Eisen <eis@baty.hanse.de> Cc: David S. Miller <davem@davemloft.net> Cc: linux-x25@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07net/irda: push BKL into proto_opsArnd Bergmann
The irda driver uses the BKL implicitly in its protocol operations. Replace the wrapped proto_ops with explicit lock_kernel() calls makes the usage more obvious and shrinks the size of the object code. The calls t lock_kernel() should eventually all be replaced by other serialization methods, which requires finding out The calls t lock_kernel() should eventually all be replaced by other serialization methods, which requires finding out which data actually needs protection. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07net/ipx: push down BKL into a ipx_dgram_opsArnd Bergmann
Making the BKL usage explicit in ipx makes it more obvious where it is used, reduces code size and helps getting rid of the BKL in common code. I did not analyse how to kill lock_kernel from ipx entirely, this will involve either proving that it's not needed, or replacing with a proper mutex or spinlock, after finding out which data structures are protected by the lock. Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: David S. Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07net/appletalk: push down BKL into a atalk_dgram_opsArnd Bergmann
Making the BKL usage explicit in appletalk makes it more obvious where it is used, reduces code size and helps getting rid of the BKL in common code. I did not analyse how to kill lock_kernel from appletalk entirely, this will involve either proving that it's not needed, or replacing with a proper mutex or spinlock, after finding out which data structures are protected by the lock. Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: David S. Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07net: Replace old style lock initializerThomas Gleixner
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06net, compat_ioctl: handle more ioctls correctlyArnd Bergmann
The MII ioctls and SIOCSIFNAME need to go through ifsioc conversion, which they never did so far. Some others are not implemented in the native path, so we can just return -EINVAL directly. Add IFSLAVE ioctls to the EINVAL list and move it to the end to optimize the code path for the common case. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06compat: move sockios handling to net/socket.cArnd Bergmann
This removes the original socket compat_ioctl code from fs/compat_ioctl.c and converts the code from the copy in net/socket.c into a single function. We add a few cycles of runtime to compat_sock_ioctl() with the long switch() statement, but gain some cycles in return by simplifying the call chain to get there. Due to better inlining, save 1.5kb of object size in the process, and enable further savings: before: text data bss dec hex filename 13540 18008 2080 33628 835c obj/fs/compat_ioctl.o 14565 636 40 15241 3b89 obj/net/socket.o after: text data bss dec hex filename 8916 15176 2080 26172 663c obj/fs/compat_ioctl.o 20725 636 40 21401 5399 obj/net/socket.o Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06appletalk: handle SIOCATALKDIFADDR compat ioctlArnd Bergmann
We must not have a compat ioctl handler for SIOCATALKDIFADDR in common code, because the same number is used in other protocols with different data structures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06net: copy socket ioctl code to net/socket.hArnd Bergmann
This makes an identical copy of the socket compat_ioctl code from fs/compat_ioctl.c to net/socket.c, as a preparation for moving the functionality in a way that can be easily reviewed. The code is hidden inside of #if 0 and gets activated in the patch that will make it work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06ipip: Fix handling of DF packets when pmtudisc is OFFHerbert Xu
RFC 2003 requires the outer header to have DF set if DF is set on the inner header, even when PMTU discovery is off for the tunnel. Our implementation does exactly that. For this to work properly the IPIP gateway also needs to engate in PMTU when the inner DF bit is set. As otherwise the original host would not be able to carry out its PMTU successfully since part of the path is only visible to the gateway. Unfortunately when the tunnel PMTU discovery setting is off, we do not collect the necessary soft state, resulting in blackholes when the original host tries to perform PMTU discovery. This problem is not reproducible on the IPIP gateway itself as the inner packet usually has skb->local_df set. This is not correctly cleared (an unrelated bug) when the packet passes through the tunnel, which allows fragmentation to occur. For hosts behind the IPIP gateway it is readily visible with a simple ping. This patch fixes the problem by performing PMTU discovery for all packets with the inner DF bit set, regardless of the PMTU discovery setting on the tunnel itself. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06netfilter: xt_connlimit: fix regression caused by zero family valueJan Engelhardt
Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all instances of par->match->family were changed to par->family. References: http://bugzilla.netfilter.org/show_bug.cgi?id=610 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06Merge branch 'for-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan
2009-11-06mac80211: async station powersave handlingJohannes Berg
Some devices require that all frames to a station are flushed when that station goes into powersave mode before being able to send frames to that station again when it wakes up or polls -- all in order to avoid reordering and too many or too few frames being sent to the station when it polls. Normally, this is the case unless the station goes to sleep and wakes up very quickly again. But in that case, frames for it may be pending on the hardware queues, and thus races could happen in the case of multiple hardware queues used for QoS/WMM. Normally this isn't a problem, but with the iwlwifi mechanism we need to make sure the race doesn't happen. This makes mac80211 able to cope with the race with driver help by a new WLAN_STA_PS_DRIVER per-station flag that can be controlled by the driver and tells mac80211 whether it can transmit frames or not. This flag must be set according to very specific rules outlined in the documentation for the function that controls it. When we buffer new frames for the station, we normally set the TIM bit right away, but while the driver has blocked transmission to that sta we need to avoid that as well since we cannot respond to the station if it wakes up due to the TIM bit. Once the driver unblocks, we can set the TIM bit. Similarly, when the station just wakes up, we need to wait until all other frames are flushed before we can transmit frames to that station, so the same applies here, we need to wait for the driver to give the OK. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-06Merge branch 'linux-2.6.33.y' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
2009-11-06ieee802154: add support for creation/removal of logic interfacesDmitry Eremin-Solenikov
Add support for two more NL802154 commands: ADD_IFACE and DEL_IFACE, thus allowing creation and removal of logic WPAN interfaces on the top of wpan-phy. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06ieee802154: add PHY_NAME to LIST_IFACE command resultsDmitry Eremin-Solenikov
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>