summaryrefslogtreecommitdiffstats
path: root/drivers/net/vxlan.c
AgeCommit message (Collapse)Author
2013-09-04vxlan: Optimize vxlan rcvPravin B Shelar
vxlan-udp-recv function lookup vxlan_sock struct on every packet recv by using udp-port number. we can use sk->sk_user_data to store vxlan_sock and avoid lookup. I have open coded rcu-api to store and read vxlan_sock from sk_user_data to avoid sparse warning as sk_user_data is not __rcu pointer. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04tunnels: harmonize cleanup done on skb on xmit pathNicolas Dichtel
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04vxlan: remove net arg from vxlan[6]_xmit_skb()Nicolas Dichtel
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04iptunnels: remove net arg from iptunnel_xmit()Nicolas Dichtel
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03drivers/net: Convert uses of compare_ether_addr to ether_addr_equalJoe Perches
Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: (and a little typing) $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02vxlan: include net/ip6_checksum.h for csum_ipv6_magic()Cong Wang
Fengguang reported a compile warning: drivers/net/vxlan.c: In function 'vxlan6_xmit_skb': drivers/net/vxlan.c:1352:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors this patch fixes it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02vxlan: fix flowi6_proto valueCong Wang
It should be IPPROTO_UDP. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 proxy supportCong Wang
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 route short circuit supportCong Wang
route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 supportCong Wang
This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: using kfree_rcu() to simplify the codeWei Yongjun
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Add tx-vlan offload support.Pravin B Shelar
Following patch allows transmit side vlan offload for vxlan devices. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Improve vxlan headroom calculation.Pravin B Shelar
Rather than having static headroom calculation, adjust headroom according to target device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Factor out vxlan send api.Pravin B Shelar
Following patch allows more code sharing between vxlan and ovs-vxlan. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Extend vxlan handlers for openvswitch.Pravin B Shelar
Following patch adds data field to vxlan socket and export vxlan handler api. vh->data is required to store private data per vxlan handler. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Add vxlan recv demux.Pravin B Shelar
Once we have ovs-vxlan functionality, one UDP port can be assigned to kernel-vxlan or ovs-vxlan port. Therefore following patch adds vxlan demux functionality, so that vxlan or ovs module can register for particular port. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Restructure vxlan receive.Pravin B Shelar
Use iptunnel_pull_header() for better code sharing. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Restructure vxlan socket apis.Pravin B Shelar
Restructure vxlan-socket management APIs so that it can be shared between vxlan and ovs modules. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> v6-v7: - get rid of zero refcnt vs from hashtable. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-09vxlan: fix a soft lockup in vxlan module removalCong Wang
This is a regression introduced by: commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8 Author: stephen hemminger <stephen@networkplumber.org> Date: Sat Jul 13 10:18:18 2013 -0700 vxlan: add necessary locking on device removal The problem is that vxlan_dellink(), which is called with RTNL lock held, tries to flush the workqueue synchronously, but apparently igmp_join and igmp_leave work need to hold RTNL lock too, therefore we have a soft lockup! As suggested by Stephen, probably the flush_workqueue can just be removed and let the normal refcounting work. The workqueue has a reference to device and socket, therefore the cleanups should work correctly. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Tested-by: Cong Wang <amwang@redhat.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09vxlan: fix a regression of igmp joinCong Wang
This is a regression introduced by: commit 3fc2de2faba387218bdf9dbc6b13f513ac3b060a Author: stephen hemminger <stephen@networkplumber.org> Date: Thu Jul 18 08:40:15 2013 -0700 vxlan: fix igmp races Before this commit, the old code was: if (vxlan_group_used(vn, vxlan->default_dst.remote_ip)) ip_mc_join_group(sk, &mreq); else ip_mc_leave_group(sk, &mreq); therefore we shoud check vxlan_group_used(), not its opposite, for igmp_join. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04vxlan: fix rcu related warningstephen hemminger
Vxlan remote list is protected by RCU and guaranteed to be non-empty. Split out the rcu and non-rcu access to the list to fix warning Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-23vxlan fdb replace an existing entryThomas Richter
Add support to replace an existing entry found in the vxlan fdb database. The entry in question is identified by its unicast mac address and the destination information is changed. If the entry is not found, it is added in the forwarding database. This is similar to changing an entry in the neighbour table. Multicast mac addresses can not be changed with the replace option. This is useful for virtual machine migration when the destination of a target virtual machine changes. The replace feature can be used instead of delete followed by add. Resubmitted because net-next was closed last week. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19vxlan: fix igmp racesstephen hemminger
There are two race conditions in existing code for doing IGMP management in workqueue in vxlan. First, the vxlan_group_used function checks the list of vxlan's without any protection, and it is possible for open followed by close to occur before the igmp work queue runs. To solve these move the check into vxlan_open/stop so it is protected by RTNL. And split into two work structures so that there is no racy reference to underlying device state. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19vxlan: unregister on namespace exitstephen hemminger
Fix memory leaks and other badness from VXLAN network namespace teardown. When network namespace is removed, all the vxlan devices should be unregistered (not closed). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-17vxlan: add necessary locking on device removalstephen hemminger
The socket management is now done in workqueue (outside of RTNL) and protected by vn->sock_lock. There were two possible bugs, first the vxlan device was removed from the VNI hash table per socket without holding lock. And there was a race when device is created and the workqueue could run after deletion. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11vxlan: Fix kernel crash on rmmod.Pravin B Shelar
vxlan exit module unregisters vxlan net and then it unregisters rtnl ops which triggers vxlan_dellink() from __rtnl_kill_links(). vxlan_dellink() deletes vxlan-dev from vxlan_list which has list-head in vxlan-net-struct but that is already gone due to net-unregister. That is how we are getting following crash. Following commit fixes the crash by fixing module exit path. BUG: unable to handle kernel paging request at ffff8804102c8000 IP: [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 PGD 2972067 PUD 83e019067 PMD 83df97067 PTE 80000004102c8060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: --- CPU: 19 PID: 6712 Comm: rmmod Tainted: GF 3.10.0+ #95 Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012 task: ffff88080c47c580 ti: ffff88080ac50000 task.ti: ffff88080ac50000 RIP: 0010:[<ffffffff812cc5e9>] [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 RSP: 0018:ffff88080ac51e08 EFLAGS: 00010206 RAX: ffff8804102c8000 RBX: ffff88040f0d4b10 RCX: dead000000200200 RDX: ffff8804102c8000 RSI: ffff88080ac51e58 RDI: ffff88040f0d4b10 RBP: ffff88080ac51e08 R08: 0000000000000001 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff88080ac51e58 R13: ffffffffa07b8840 R14: ffffffff81ae48c0 R15: ffff88080ac51e58 FS: 00007f9ef105c700(0000) GS:ffff88082a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8804102c8000 CR3: 00000008227e5000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88080ac51e28 ffffffff812cc6a1 2222222222222222 ffff88040f0d4000 ffff88080ac51e48 ffffffffa07b3311 ffff88040f0d4000 ffffffff81ae49c8 ffff88080ac51e98 ffffffff81492fc2 ffff88080ac51e58 ffff88080ac51e58 Call Trace: [<ffffffff812cc6a1>] list_del+0x11/0x40 [<ffffffffa07b3311>] vxlan_dellink+0x51/0x70 [vxlan] [<ffffffff81492fc2>] __rtnl_link_unregister+0xa2/0xb0 [<ffffffff8149448e>] rtnl_link_unregister+0x1e/0x30 [<ffffffffa07b7b7c>] vxlan_cleanup_module+0x1c/0x2f [vxlan] [<ffffffff810c9b31>] SyS_delete_module+0x1d1/0x2c0 [<ffffffff812b8a0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81582f42>] system_call_fastpath+0x16/0x1b Code: eb 9f 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 RIP [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 RSP <ffff88080ac51e08> CR2: ffff8804102c8000 Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25vxlan: fix function name spellingStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: fdb: allow specifying multiple destinations for zero MACMike Rapoport
The zero MAC entry in the fdb is used as default destination. With multiple default destinations it is possible to use vxlan in environments that disable multicast on the infrastructure level, e.g. public clouds. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: allow removal of single destination from fdb entryMike Rapoport
When the last item is deleted from the remote destinations list, the fdb entry is destroyed. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: introduce vxlan_fdb_parseMike Rapoport
which will be reused by vxlan_fdb_delete Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: introduce vxlan_fdb_find_rdstMike Rapoport
which will be reused by vxlan_fdb_delete Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: add implicit fdb entry for default destinationMike Rapoport
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: Fix sparse warnings.Pravin B Shelar
Fix following sparse warnings. drivers/net/vxlan.c:238:44: warning: incorrect type in argument 3 (different base types) drivers/net/vxlan.c:238:44: expected restricted __be32 [usertype] value drivers/net/vxlan.c:238:44: got unsigned int const [unsigned] [usertype] remote_vni drivers/net/vxlan.c:1735:18: warning: incorrect type in initializer (different signedness) drivers/net/vxlan.c:1735:18: expected int *id drivers/net/vxlan.c:1735:18: got unsigned int static [toplevel] *<noident> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: cosmetic cleanup'sStephen Hemminger
Fix whitespace and spelling Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David L Stevens <dlstevens@us.ibm.com>
2013-06-24vxlan: Use initializer for dummy structuresStephen Hemminger
For the notification code, a couple of places build fdb entries on the stack, use structure initialization instead and fix formatting. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: port module param should be ushortStephen Hemminger
UDP ports are limited to 16 bits. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: convert remotes list to list_rcuStephen Hemminger
Based on initial work by Mike Rapoport <mike.rapoport@ravellosystems.com> Use list macros and RCU for tracking multiple remotes. Note: this code assumes list always has at least one entry, because delete is not supported. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: make vxlan_xmit_one voidStephen Hemminger
The function vxlan_xmit_one always returns NETDEV_TX_OK, so there is no point in keeping track of return values etc. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David L Stevens <dlstevens@us.ibm.com>
2013-06-24vxlan: move cleanup to uninitStephen Hemminger
Put destruction of per-cpu statistics removal in ndo_uninit since it is created by ndo_init. This also avoids any problems that might be cause by destructor being called after module removed. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: fix race caused by dropping rtnl_unlockStephen Hemminger
It is possible for two cpu's to race creating vxlan device. For most cases this is harmless, but the ability to assign "next avaliable vxlan device" relies on rtnl lock being held across the whole operation. Therfore two instances of calling: ip li add vxlan%d vxlan ... could collide and create two devices with same name. To fix this defer creation of socket to a work queue, and handle possible races there. Introduce a lock to ensure that changes to vxlan socket hash list is SMP safe. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: send notification when MAC migratesStephen Hemminger
When learned entry migrates to another IP send a notification that entry has changed. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: move IGMP join/leave to work queueStephen Hemminger
Do join/leave from work queue to avoid lock inversion problems between normal socket and RTNL. The code comes out cleaner as well. Uses Cong Wang's suggestion to turn refcnt into a real atomic since now need to handle case where last use of socket is IGMP worker. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: fix crash from work pending on module removalStephen Hemminger
Switch to using a per module work queue so that all the socket deletion callbacks are done when module is removed. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-24vxlan: fix out of order operation on module removalStephen Hemminger
If vxlan is removed with active vxlan's it would crash because rtnl_link_unregister (which calls vxlan_dellink), was invoked before unregister_pernet_device (which calls vxlan_stop). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-19ip_tunnels: extend iptunnel_xmit()Pravin B Shelar
Refactor various ip tunnels xmit functions and extend iptunnel_xmit() so that there is more code sharing. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19vxlan: fix check for migration of static entrystephen hemminger
The check introduced by: commit 26a41ae604381c5cc0caf1c3261ca6b298b5fe69 Author: stephen hemminger <stephen@networkplumber.org> Date: Mon Jun 17 12:09:58 2013 -0700 vxlan: only migrate dynamic FDB entries was not correct because it is checking flag about type of FDB entry, rather than the state (dynamic versus static). The confusion arises because vxlan is reusing values from bridge, and bridge is reusing values from neighbour table, and easy to get lost in translation. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17vxlan: handle skb_clone failurestephen hemminger
If skb_clone fails if out of memory then just skip the fanout. Problem was introduced in 3.10 with: commit 6681712d67eef14c4ce793561c3231659153a320 Author: David Stevens <dlstevens@us.ibm.com> Date: Fri Mar 15 04:35:51 2013 +0000 vxlan: generalize forwarding tables Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>