summaryrefslogtreecommitdiffstats
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2013-04-19bonding: in bond_mc_swap() bond's mc addr list is walked without locknikolay@redhat.com
Use netif_addr_lock_bh() to acquire the appropriate lock before walking. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19bonding: disable netpoll on enslave failurenikolay@redhat.com
slave_disable_netpoll() is not called upon enslave failure which would lead to a memory leak. Call slave_disable_netpoll() after err_detach as that's the first error path after enabling netpoll on that slave. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19bonding: primary_slave & curr_active_slave are not cleaned on enslave failurenikolay@redhat.com
On enslave failure primary_slave can point to new_slave which is to be freed, and the same applies to curr_active_slave. So check if this is the case and clean up properly after err_detach because that's the first error code path after they're set. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19bonding: vlans don't get deleted on enslave failurenikolay@redhat.com
The main problem is with vid refcount which only gets bumped up. Delete the vlans after err_detach as that's the first error path after the vlans are added. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19bonding: mc addresses don't get deleted on enslave failurenikolay@redhat.com
Add bond_mc_list_flush() after err_detach as that's the first error path after the addresses are added. The main issue is the mc addresses' refcount which only gets bumped up. v2: update log message and don't move code unnecessarily Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18bonding: fix l23 and l34 load balancing in forwarding pathEric Dumazet
Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing) bonding doesn't properly hash traffic in forwarding setups. Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in this case. More generally, the transport header might not be in the skb head. Use pskb_may_pull() & skb_header_pointer() to get it right, and use proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for more protocols than TCP and UDP. Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: John Eaglesham <linux@8192.net> Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11bonding: IFF_BONDING is not stripped on enslave failurenikolay@redhat.com
While enslaving a new device and after IFF_BONDING flag is set, in case of failure it is not stripped from the device's priv_flags while cleaning up, which could lead to other problems. Cleaning at err_close because the flag is set after dev_open(). v2: no change Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11bonding: fix netdev event NULL pointer dereferencenikolay@redhat.com
In commit 471cb5a33dcbd7c529684a2ac7ba4451414ee4a7 ("bonding: remove usage of dev->master") a bug was introduced which causes a NULL pointer dereference. If a bond device is in mode 6 (ALB) and a slave is added it will dereference a NULL pointer in bond_slave_netdev_event(). This is because in bond_enslave we have bond_alb_init_slave() which changes the MAC address of the slave and causes a NETDEV_CHANGEADDR. Then we have in bond_slave_netdev_event(): struct slave *slave = bond_slave_get_rtnl(slave_dev); struct bonding *bond = slave->bond; bond_slave_get_rtnl() dereferences slave_dev->rx_handler_data which at that time is NULL since netdev_rx_handler_register() is called later. This is fixed by checking if slave is NULL before dereferencing it. v2: Comment style changed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08bonding: fix bonding_masters race condition in bond unloadingnikolay@redhat.com
While the bonding module is unloading, it is considered that after rtnl_link_unregister all bond devices are destroyed but since no synchronization mechanism exists, a new bond device can be created via bonding_masters before unregister_pernet_subsys which would lead to multiple problems (e.g. NULL pointer dereference, wrong RIP, list corruption). This patch fixes the issue by removing any bond devices left in the netns after bonding_masters is removed from sysfs. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08Revert "bonding: remove sysfs before removing devices"nikolay@redhat.com
This reverts commit 4de79c737b200492195ebc54a887075327e1ec1d. This patch introduces a new bug which causes access to freed memory. In bond_uninit: list_del(&bond->bond_list); bond_list is linked in bond_net's dev_list which is freed by unregister_pernet_subsys. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05bonding: remove sysfs before removing devicesVeaceslav Falico
We have a race condition if we try to rmmod bonding and simultaneously add a bond master through sysfs. In bonding_exit() we first remove the devices (through rtnl_link_unregister() ) and only after that we remove the sysfs. If we manage to add a device through sysfs after that the devices were removed - we'll end up with that device/sysfs structure and with the module unloaded. Fix this by first removing the sysfs and only after that calling rtnl_link_unregister(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02bonding: get netdev_rx_handler_unregister out of locksVeaceslav Falico
Now that netdev_rx_handler_unregister contains synchronize_net(), we need to call it outside of bond->lock, cause it might sleep. Also, remove the already unneded synchronize_net(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29bonding: fix disabling of arp_interval and miimonnikolay@redhat.com
Currently if either arp_interval or miimon is disabled, they both get disabled, and upon disabling they get executed once more which is not the proper behaviour. Also when doing a no-op and disabling an already disabled one, the other again gets disabled. Also fix the error messages with the proper valid ranges, and a small typo fix in the up delay error message (outputting "down delay", instead of "up delay"). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26bonding: remove already created master sysfs link on failureVeaceslav Falico
If slave sysfs symlink failes to be created - we end up without removing the master sysfs symlink. Remove it in case of failure. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13bonding: don't call update_speed_duplex() under spinlocksVeaceslav Falico
bond_update_speed_duplex() might sleep while calling underlying slave's routines. Move it out of atomic context in bond_enslave() and remove it from bond_miimon_commit() - it was introduced by commit 546add79, however when the slave interfaces go up/change state it's their responsibility to fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update their speed. I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex changes, remote-controlled and local, on (not) MII-based cards. All changes are visible. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07bonding: fire NETDEV_RELEASE event only on 0 slavesVeaceslav Falico
Currently, if we set up netconsole over bonding and release a slave, netconsole will stop logging on the whole bonding device. Change the behavior to stop the netconsole only when the last slave is released. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-26bond: check if slave count is 0 in case when deciding to take slave's macJiri Pirko
in bond_enslave(), check slave_cnt before actually using slave address. introduced by: commit 409cc1f8a41 (bond: have random dev address by default instead of zeroes) Reported-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19bonding: set sysfs device_type to 'bond'Doug Goldstein
Sets the sysfs device_type to 'bond' for udev. This allows udev rules to be created for bond devices. This is similar to how other network devices set their device_type. Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19bonding: fix bond_release_all inconsistenciesnikolay@redhat.com
This patch fixes the following inconsistencies in bond_release_all: - IFF_BONDING flag is not stripped from slaves - MTU is not restored - no netdev notifiers are sent Instead of trying to keep bond_release and bond_release_all in sync I think we can re-use bond_release as the environment for calling it is correct (RTNL is held). I have been running tests for the past week and they came out successful. The only way for bond_release to fail is for the slave to be attached in a different bond or to not be a slave but that cannot happen as RTNL is held and no slave manipulations can be achieved. V2: As suggested bond_release is renamed to __bond_release_one with a new parameter "all" introduced so to avoid calling unnecessary code while destroying a bond, and a wrapper for it called bond_release is created because of ndo_del_link. bond_release_all() is removed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19bonding: Fix initialize after use for 3ad machine state spinlocknikolay@redhat.com
The 3ad machine state spinlock can be used before it is inititialized while doing bond_enslave() (and the port is being initialized) since port->slave is set before the lock is prepared, thus causing soft lock-ups and a multitude of other nasty bugs. [ Rename __initialize_port_locks() variable name to 'slave' -DaveM ] Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19bonding: Fix race condition between bond_enslave() and ↵nikolay@redhat.com
bond_3ad_update_lacp_rate() port->slave can be NULL since it's being initialized in bond_enslave thus dereferencing a NULL pointer in bond_3ad_update_lacp_rate() Also fix a minor bug, which could cause a port not to have AD_STATE_LACP_TIMEOUT since there's no sync between bond_3ad_update_lacp_rate() and bond_3ad_bind_slave(), by changing the read_lock to a write_lock_bh in bond_3ad_update_lacp_rate(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lockNeil Horman
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is already held. The mechanism is used to asynchronously call __netpoll_cleanup outside of the holding of the rtnl_lock, so as to avoid deadlock. Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the rtnl_lock must be held while calling it. Further, it cannot be held, because rcu callbacks may be issued in softirq contexts, which cannot sleep. Fix this by converting the rcu callback to a work queue that is guaranteed to get scheduled in process context, so that we can hold the rtnl properly while calling __netpoll_cleanup Tested successfully by myself. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Cong Wang <amwang@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/e1000e/ethtool.c drivers/net/vmxnet3/vmxnet3_drv.c drivers/net/wireless/iwlwifi/dvm/tx.c net/ipv6/route.c The ipv6 route.c conflict is simple, just ignore the 'net' side change as we fixed the same problem in 'net-next' by eliminating cached neighbours from ipv6 routes. The e1000e conflict is an addition of a new statistic in the ethtool code, trivial. The vmxnet3 conflict is about one change in 'net' removing a guarding conditional, whilst in 'net-next' we had a netdev_info() conversion. The iwlwifi conflict is dealing with a WARN_ON() conversion in 'net-next' vs. a revert happening in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04netns: bond: allow unprivileged users to control bond deviceGao feng
reduce the permission check of bond device's ioctl. allow the userns root to control the bond device. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30bond: have random dev address by default instead of zeroesJiri Pirko
Makes more sense to have randomly generated address by default than to have all zeroes. It also allows user to for example put the bond into bridge without need to have any slaves in it. Also note that this changes only behaviour of bonds with no slaves. Once the first slave device is enslaved, its address will be used (no change here). Also, fix dev_assign_type values on the way. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29bonding: unset primary slave via sysfsMilos Vyletel
When bonding module is loaded with primary parameter and one decides to unset primary slave using sysfs these settings are not preserved during bond device restart. Primary slave is only unset once and it's not remembered in bond->params structure. Below is example of recreation. grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond0 BONDING_OPTS="mode=active-backup miimon=100 primary=eth01" grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: eth01 (primary_reselect always) echo "" > /sys/class/net/bond0/bonding/primary grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: None sed -i -e 's/primary=eth01//' /etc/sysconfig/network-scripts/ifcfg-bond0 grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond BONDING_OPTS="mode=active-backup miimon=100 " ifdown bond0 && ifup bond0 without patch: grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: eth01 (primary_reselect always) with patch: grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: None Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06ethtool: fix drvinfo strings set in driversJiri Pirko
Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-04bonding: remove usage of dev->masterJiri Pirko
Benefit from new upper dev list and free bonding from dev->master usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14bonding: do not cancel works in bond_uninit()Konstantin Khlebnikov
Bonding initializes these works in bond_open() and cancels in bond_close(), thus in bond_uninit() they are already canceled but may be unitialized yet. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...
2012-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...
2012-12-07bonding: Fix check for ethtool get_link operation supportBen Hutchings
Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops') all devices have a non-null ethtool_ops. Test only dev->ethtool_ops->get_link in both places where we care. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30bonding: delete migrated IP addresses from the rlb hash tableJiri Bohac
Bonding in balance-alb mode records information from ARP packets passing through the bond in a hash table (rx_hashtbl). At certain situations (e.g. link change of a slave), rlb_update_rx_clients() will send out ARP packets to update ARP caches of other hosts on the network to achieve RX load balancing. The problem is that once an IP address is recorded in the hash table, it stays there indefinitely. If this IP address is migrated to a different host in the network, bonding still sends out ARP packets that poison other systems' ARP caches with invalid information. This patch solves this by looking at all incoming ARP packets, and checking if the source IP address is one of the source addresses stored in the rx_hashtbl. If it is, but the MAC addresses differ, the corresponding hash table entries are removed. Thus, when an IP address is migrated, the first ARP broadcast by its new owner will purge the offending entries of rx_hashtbl. The hash table is hashed by ip_dst. To be able to do the above check efficiently (not walking the whole hash table), we need a reverse mapping (by ip_src). I added three new members in struct rlb_client_info: rx_hashtbl[x].src_first will point to the start of a list of entries for which hash(ip_src) == x. The list is linked with src_next and src_prev. When an incoming ARP packet arrives at rlb_arp_recv() rlb_purge_src_ip() can quickly walk only the entries on the corresponding lists, i.e. the entries that are likely to contain the offending IP address. To avoid confusion, I renamed these existing fields of struct rlb_client_info: next -> used_next prev -> used_prev rx_hashtbl_head -> rx_hashtbl_used_head (The current linked list is _not_ a list of hash table entries with colliding ip_dst. It's a list of entries that are being used; its purpose is to avoid walking the whole hash table when looking for used entries.) Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30bonding: rlb mode of bond should not alter ARP originating via bridgezheng.li
Do not modify or load balance ARP packets passing through balance-alb mode (wherein the ARP did not originate locally, and arrived via a bridge). Modifying pass-through ARP replies causes an incorrect MAC address to be placed into the ARP packet, rendering peers unable to communicate with the actual destination from which the ARP reply originated. Load balancing pass-through ARP requests causes an entry to be created for the peer in the rlb table, and bond_alb_monitor will occasionally issue ARP updates to all peers in the table instrucing them as to which MAC address they should communicate with; this occurs when some event sets rx_ntt. In the bridged case, however, the MAC address used for the update would be the MAC of the slave, not the actual source MAC of the originating destination. This would render peers unable to communicate with the destinations beyond the bridge. Signed-off-by: Zheng Li <zheng.x.li@oracle.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29bonding: fix race condition in bonding_store_slaves_activenikolay@redhat.com
Race between bonding_store_slaves_active() and slave manipulation functions. The bond_for_each_slave use in bonding_store_slaves_active() is not protected by any synchronization mechanism. NULL pointer dereference is easy to reach. Fixed by acquiring the bond->lock for the slave walk. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29bonding: make arp_ip_target parameter checks consistent with sysfsnikolay@redhat.com
The module can be loaded with arp_ip_target="255.255.255.255" which makes it impossible to remove as the function in sysfs checks for that value, so we make the parameter checks consistent with sysfs. v2: Fix formatting v3: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29bonding: fix miimon and arp_interval delayed work race conditionsnikolay@redhat.com
First I would give three observations which will be used later. Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq) This usage is wrong because the pending bit is cleared just before the work's fn is executed and if the function re-arms itself we might end up with the work still running. It's safe to call cancel_delayed_work_sync() even if the work is not queued at all. Observation 2: Use of INIT_DELAYED_WORK() Work needs to be initialized only once prior to (de/en)queueing. Observation 3: IFF_UP is set only after ndo_open is called Related race conditions: 1. Race between bonding_store_miimon() and bonding_store_arp_interval() Because of Obs.1 we can end up having both works enqueued. 2. Multiple races with INIT_DELAYED_WORK() Since the works are not protected by anything between INIT_DELAYED_WORK() and calls to (en/de)queue it is possible for races between the following functions: (races are also possible between the calls to INIT_DELAYED_WORK() and workqueue code) bonding_store_miimon() - bonding_store_arp_interval(), bond_close(), bond_open(), enqueued functions bonding_store_arp_interval() - bonding_store_miimon(), bond_close(), bond_open(), enqueued functions 3. By Obs.1 we need to change bond_cancel_all() Bugs 1 and 2 are fixed by moving all work initializations in bond_open which by Obs. 2 and Obs. 3 and the fact that we make sure that all works are cancelled in bond_close(), is guaranteed not to have any work enqueued. Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so they can't race with bond_close and bond_open. The opposing work is cancelled only if the IFF_UP flag is set and it is cancelled unconditionally. The opposing work is already cancelled if the interface is down so no need to cancel it again. This way we don't need new synchronizations for the bonding workqueue. These bugs (and fixes) are tied together and belong in the same patch. Note: I have left 1 line intentionally over 80 characters (84) because I didn't like how it looks broken down. If you'd prefer it otherwise, then simply break it. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28bonding: in balance-rr mode, set curr_active_slave only if it is upMichal Kubeček
If all slaves of a balance-rr bond with ARP monitor are enslaved with down link state, bond keeps down state even after slaves go up. This is caused by bond_enslave() setting curr_active_slave to first slave not taking into account its link state. As bond_loadbalance_arp_mon() uses curr_active_slave to identify whether slave's down->up transition should update bond's link state, bond stays down even if slaves are up (until first slave goes from up to down at least once). Before commit f31c7937 "bonding: start slaves with link down for ARP monitor", this was masked by slaves always starting in UP state with ARP monitor (and MII monitor not relying on curr_active_slave being NULL if there is no slave up). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-21bonding: Bonding driver does not consider the gso_max_size/gso_max_segs ↵Sarveshwar Bandi
setting of slave devices. Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19treewide: fix typo of "suport" in various comments and KconfigMasanari Iida
Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-01bonding: fix second off-by-one errornikolay@redhat.com
Fix off-by-one error because IFNAMSIZ == 16 and when this code gets executed we stick a NULL byte where we should not. How to reproduce: with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently) modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode; echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/active_slave; Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Note: Sorry for the second patch but I missed this one while checking the file. You can squash them into one patch. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01bonding: fix off-by-one errornikolay@redhat.com
Fix off-by-one error because IFNAMSIZ == 16 and when this code gets executed we stick a NULL byte where we should not. How to reproduce: with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently) modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode; echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/primary; Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16vlan: fix bond/team enslave of vlan challenged slave/portJiri Pirko
In vlan_uses_dev() check for number of vlan devs rather than existence of vlan_info. The reason is that vlan id 0 is there without appropriate vlan dev on it by default which prevented from enslaving vlan challenged dev. Reported-by: Jon Stanley <jstanley@rmrf.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04bonding: set qdisc_tx_busylock to avoid LOCKDEP splatEric Dumazet
If a qdisc is installed on a bonding device, its possible to get following lockdep splat under stress : ============================================= [ INFO: possible recursive locking detected ] 3.6.0+ #211 Not tainted --------------------------------------------- ping/4876 is trying to acquire lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 but task is already holding lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by ping/4876: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30 #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 #3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 #4: (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding] #5: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 stack backtrace: Pid: 4876, comm: ping Not tainted 3.6.0+ #211 Call Trace: [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80 [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100 [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90 [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding] [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0 [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0 [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870 [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870 [<ffffffff815bb964>] ip_output+0x54/0xf0 [<ffffffff815bad48>] ip_local_out+0x28/0x90 [<ffffffff815bc444>] ip_send_skb+0x14/0x50 [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40 [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30 [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6855>] inet_sendmsg+0x125/0x240 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390 [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0 [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0 [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0 [<ffffffff8116f98a>] ? fget_light+0x3da/0x510 [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80 [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b Avoid this problem using a distinct lock_class_key for bonding devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31bonding: add some slack to arp monitoring time limitsJiri Bohac
Currently, all the time limits in the bonding ARP monitor are in multiples of arp_interval -- the time interval at which the ARP monitor is periodically scheduled. With a fast network round-trip and a little scheduling latency of the ARP monitor work, a limit of n*delta_in_ticks may effectively mean (n-1)*delta_in_ticks. This is fatal in case of n==1 (the link will stay down forever) and makes the behaviour non-deterministic in all the other cases. Add a delta_in_ticks/2 time slack to all the time limits. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22bonding: support for IPv6 transmit hashingJohn Eaglesham
Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-08-14netpoll: check netpoll tx status on the right deviceAmerigo Wang
Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14netpoll: make __netpoll_cleanup non-blockAmerigo Wang
Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()Amerigo Wang
slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>