Age | Commit message (Collapse) | Author |
|
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.
The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.
This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.
Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.
Feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Use of RCU api makes vxlan code easier to understand. It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk->sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.
CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit afbd8bae9c798c5cdbe4439d3a50536b5438247c
vxlan: add implicit fdb entry for default destination
creates an implicit fdb entry for default destination. This results
in an invalid fdb entry if default destination is not specified.
For ex:
ip link add vxlan1 type vxlan id 100
creates the following fdb entry
00:00:00:00:00:00 dev vxlan1 dst 0.0.0.0 self permanent
This patch fixes this issue by creating an fdb entry only if a
valid default destination is specified.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes sparse warnings when incorrectly handling the port number
and using int instead of unsigned int iterating through &vn->sock_list[].
Keeping the port as __be16 also makes things clearer wrt endianess.
Also, it was pointed out that vxlan_get_rx_port() had unnecessary checks
which got removed.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
ndo_del_rx_vxlan_port().
Drivers can get notifications through the above functions about changes
of the UDP listening port of VXLAN. Also, when physical ports come up,
now they can call vxlan_get_rx_port() in order to obtain the port number(s)
of the existing VXLAN interface in case they already up before them.
This information about the listening UDP port would be used for VXLAN
related offloads.
A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
input and his suggestions on this patch set.
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.
I have open coded rcu-api to store and read vxlan_sock from
sk_user_data to avoid sparse warning as sk_user_data is not
__rcu pointer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This argument is not used, let's remove it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This argument is not used, let's remove it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script: (and a little typing)
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fengguang reported a compile warning:
drivers/net/vxlan.c: In function 'vxlan6_xmit_skb':
drivers/net/vxlan.c:1352:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
this patch fixes it.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It should be IPPROTO_UDP.
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na()
will be needed.
Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
route short circuit only has IPv4 part, this patch adds
the IPv6 part. nd_tbl will be needed.
Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds IPv6 support to vxlan device, as the new version
RFC already mentions it:
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following patch allows transmit side vlan offload for vxlan
devices.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than having static headroom calculation, adjust headroom
according to target device.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following patch allows more code sharing between vxlan and ovs-vxlan.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following patch adds data field to vxlan socket and export
vxlan handler api.
vh->data is required to store private data per vxlan handler.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once we have ovs-vxlan functionality, one UDP port can be assigned
to kernel-vxlan or ovs-vxlan port. Therefore following patch adds
vxlan demux functionality, so that vxlan or ovs module can
register for particular port.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use iptunnel_pull_header() for better code sharing.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Restructure vxlan-socket management APIs so that it can be
shared between vxlan and ovs modules.
This patch does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
v6-v7:
- get rid of zero refcnt vs from hashtable.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This is a regression introduced by:
commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8
Author: stephen hemminger <stephen@networkplumber.org>
Date: Sat Jul 13 10:18:18 2013 -0700
vxlan: add necessary locking on device removal
The problem is that vxlan_dellink(), which is called with RTNL lock
held, tries to flush the workqueue synchronously, but apparently
igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
have a soft lockup!
As suggested by Stephen, probably the flush_workqueue can just be
removed and let the normal refcounting work. The workqueue has a
reference to device and socket, therefore the cleanups should work
correctly.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a regression introduced by:
commit 3fc2de2faba387218bdf9dbc6b13f513ac3b060a
Author: stephen hemminger <stephen@networkplumber.org>
Date: Thu Jul 18 08:40:15 2013 -0700
vxlan: fix igmp races
Before this commit, the old code was:
if (vxlan_group_used(vn, vxlan->default_dst.remote_ip))
ip_mc_join_group(sk, &mreq);
else
ip_mc_leave_group(sk, &mreq);
therefore we shoud check vxlan_group_used(), not its opposite,
for igmp_join.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vxlan remote list is protected by RCU and guaranteed to be non-empty.
Split out the rcu and non-rcu access to the list to fix warning
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to replace an existing entry found in the
vxlan fdb database. The entry in question is identified
by its unicast mac address and the destination information
is changed. If the entry is not found, it is added in the
forwarding database. This is similar to changing an entry
in the neighbour table.
Multicast mac addresses can not be changed with the replace
option.
This is useful for virtual machine migration when the
destination of a target virtual machine changes. The replace
feature can be used instead of delete followed by add.
Resubmitted because net-next was closed last week.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are two race conditions in existing code for doing IGMP
management in workqueue in vxlan. First, the vxlan_group_used
function checks the list of vxlan's without any protection, and
it is possible for open followed by close to occur before the
igmp work queue runs.
To solve these move the check into vxlan_open/stop so it is
protected by RTNL. And split into two work structures so that
there is no racy reference to underlying device state.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix memory leaks and other badness from VXLAN network namespace
teardown. When network namespace is removed, all the vxlan devices should
be unregistered (not closed).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The socket management is now done in workqueue (outside of RTNL)
and protected by vn->sock_lock. There were two possible bugs, first
the vxlan device was removed from the VNI hash table per socket without
holding lock. And there was a race when device is created and the workqueue
could run after deletion.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vxlan exit module unregisters vxlan net and then it unregisters
rtnl ops which triggers vxlan_dellink() from __rtnl_kill_links().
vxlan_dellink() deletes vxlan-dev from vxlan_list which has
list-head in vxlan-net-struct but that is already gone due to
net-unregister. That is how we are getting following crash.
Following commit fixes the crash by fixing module exit path.
BUG: unable to handle kernel paging request at ffff8804102c8000
IP: [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0
PGD 2972067 PUD 83e019067 PMD 83df97067 PTE 80000004102c8060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ---
CPU: 19 PID: 6712 Comm: rmmod Tainted: GF 3.10.0+ #95
Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012
task: ffff88080c47c580 ti: ffff88080ac50000 task.ti: ffff88080ac50000
RIP: 0010:[<ffffffff812cc5e9>] [<ffffffff812cc5e9>]
__list_del_entry+0x29/0xd0
RSP: 0018:ffff88080ac51e08 EFLAGS: 00010206
RAX: ffff8804102c8000 RBX: ffff88040f0d4b10 RCX: dead000000200200
RDX: ffff8804102c8000 RSI: ffff88080ac51e58 RDI: ffff88040f0d4b10
RBP: ffff88080ac51e08 R08: 0000000000000001 R09: 2222222222222222
R10: 2222222222222222 R11: 2222222222222222 R12: ffff88080ac51e58
R13: ffffffffa07b8840 R14: ffffffff81ae48c0 R15: ffff88080ac51e58
FS: 00007f9ef105c700(0000) GS:ffff88082a800000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8804102c8000 CR3: 00000008227e5000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88080ac51e28 ffffffff812cc6a1 2222222222222222 ffff88040f0d4000
ffff88080ac51e48 ffffffffa07b3311 ffff88040f0d4000 ffffffff81ae49c8
ffff88080ac51e98 ffffffff81492fc2 ffff88080ac51e58 ffff88080ac51e58
Call Trace:
[<ffffffff812cc6a1>] list_del+0x11/0x40
[<ffffffffa07b3311>] vxlan_dellink+0x51/0x70 [vxlan]
[<ffffffff81492fc2>] __rtnl_link_unregister+0xa2/0xb0
[<ffffffff8149448e>] rtnl_link_unregister+0x1e/0x30
[<ffffffffa07b7b7c>] vxlan_cleanup_module+0x1c/0x2f [vxlan]
[<ffffffff810c9b31>] SyS_delete_module+0x1d1/0x2c0
[<ffffffff812b8a0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81582f42>] system_call_fastpath+0x16/0x1b
Code: eb 9f 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89
e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b
00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
RIP [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0
RSP <ffff88080ac51e08>
CR2: ffff8804102c8000
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
The zero MAC entry in the fdb is used as default destination. With
multiple default destinations it is possible to use vxlan in
environments that disable multicast on the infrastructure level, e.g.
public clouds.
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
When the last item is deleted from the remote destinations list, the
fdb entry is destroyed.
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
which will be reused by vxlan_fdb_delete
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
which will be reused by vxlan_fdb_delete
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Fix following sparse warnings.
drivers/net/vxlan.c:238:44: warning: incorrect type in argument 3 (different base types)
drivers/net/vxlan.c:238:44: expected restricted __be32 [usertype] value
drivers/net/vxlan.c:238:44: got unsigned int const [unsigned] [usertype] remote_vni
drivers/net/vxlan.c:1735:18: warning: incorrect type in initializer (different signedness)
drivers/net/vxlan.c:1735:18: expected int *id
drivers/net/vxlan.c:1735:18: got unsigned int static [toplevel] *<noident>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Fix whitespace and spelling
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
|
|
For the notification code, a couple of places build fdb entries on
the stack, use structure initialization instead and fix formatting.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
UDP ports are limited to 16 bits.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Based on initial work by Mike Rapoport <mike.rapoport@ravellosystems.com>
Use list macros and RCU for tracking multiple remotes.
Note: this code assumes list always has at least one entry,
because delete is not supported.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
The function vxlan_xmit_one always returns NETDEV_TX_OK, so there
is no point in keeping track of return values etc.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
|
|
Put destruction of per-cpu statistics removal in
ndo_uninit since it is created by ndo_init.
This also avoids any problems that might be cause by destructor
being called after module removed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
It is possible for two cpu's to race creating vxlan device.
For most cases this is harmless, but the ability to assign "next
avaliable vxlan device" relies on rtnl lock being held across the
whole operation. Therfore two instances of calling:
ip li add vxlan%d vxlan ...
could collide and create two devices with same name.
To fix this defer creation of socket to a work queue, and
handle possible races there. Introduce a lock to ensure that
changes to vxlan socket hash list is SMP safe.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
When learned entry migrates to another IP send a notification
that entry has changed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Do join/leave from work queue to avoid lock inversion problems
between normal socket and RTNL. The code comes out cleaner
as well.
Uses Cong Wang's suggestion to turn refcnt into a real atomic
since now need to handle case where last use of socket is IGMP
worker.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
|
Switch to using a per module work queue so that all the socket
deletion callbacks are done when module is removed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|