summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2015-02-09ipv6: Fix fragment id assignment on LE arches.Vlad Yasevich
Recent commit: 0508c07f5e0c94f38afd5434e8b2a55b84553077 Author: Vlad Yasevich <vyasevich@gmail.com> Date: Tue Feb 3 16:36:15 2015 -0500 ipv6: Select fragment id during UFO segmentation if not set. Introduced a bug on LE in how ipv6 fragment id is assigned. This was cought by nightly sparce check: Resolve the following sparce error: net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment (different base types) net/ipv6/output_core.c:57:38: expected restricted __be32 [usertype] ip6_frag_id net/ipv6/output_core.c:57:38: got unsigned int [unsigned] [assigned] [usertype] id Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09bridge: Fix inability to add non-vlan fdb entryToshiaki Makita
Bridge's default_pvid adds a vid by default, by which we cannot add a non-vlan fdb entry by default, because br_fdb_add() adds fdb entries for all vlans instead of a non-vlan one when any vlan is configured. # ip link add br0 type bridge # ip link set eth0 master br0 # bridge fdb add 12:34:56:78:90:ab dev eth0 master temp # bridge fdb show brport eth0 | grep 12:34:56:78:90:ab 12:34:56:78:90:ab dev eth0 vlan 1 static We expect a non-vlan fdb entry as well as vlan 1: 12:34:56:78:90:ab dev eth0 static To fix this, we need to insert a non-vlan fdb entry if vlan is not specified, even when any vlan is configured. Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09net: dsa: Remove redundant phy_attach()Andrew Lunn
dsa_slave_phy_setup() finds the phy for the port via device tree and using of_phy_connect(), or it uses the fall back of taking a phy from the switch internal mdio bus and calling phy_connect_direct(). Either way, if a phy is found, phy_attach_direct() is called to attach the phy to the slave device. In dsa_slave_create(), a second call to phy_attach() is made. This results in the warning "PHY already attached". Remove this second, redundant attaching of the phy. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: remove tipc_snprintfRichard Alpe
tipc_snprintf() was heavily utilized by the old netlink API which no longer exists (now netlink compat). In this patch we swap tipc_snprintf() to the identical scnprintf() in the only remaining occurrence. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: nl compat add noop and remove legacy nl frameworkRichard Alpe
Add TIPC_CMD_NOOP to compat layer and remove the old framework. All legacy nl commands are now converted to the compat layer in netlink_compat.c. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl stats show to nl compatRichard Alpe
Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not have any counterpart in the new API, meaning it now solely exists as a function in the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl net id get to nl compatRichard Alpe
Convert TIPC_CMD_GET_NETID to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl net id set to nl compatRichard Alpe
Convert TIPC_CMD_SET_NETID to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl node addr set to nl compatRichard Alpe
Convert TIPC_CMD_SET_NODE_ADDR to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl node dump to nl compatRichard Alpe
Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl media dump to nl compatRichard Alpe
Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl socket dump to nl compatRichard Alpe
Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format then into the legacy buffer before continuing to process the sockets (ports). Command converted in this patch: TIPC_CMD_SHOW_PORTS Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl name table dump to nl compatRichard Alpe
Add functionality for printing a dump header and convert TIPC_CMD_SHOW_NAME_TABLE to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl link stat reset to nl compatRichard Alpe
Convert TIPC_CMD_RESET_LINK_STATS to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl link prop set to nl compatRichard Alpe
Convert setting of link proprieties to compat doit calls. Commands converted in this patch: TIPC_CMD_SET_LINK_TOL TIPC_CMD_SET_LINK_PRI TIPC_CMD_SET_LINK_WINDOW Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl link dump to nl compatRichard Alpe
Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl link stat to nl compatRichard Alpe
Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl bearer enable/disable to nl compatRichard Alpe
Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl bearer dump to nl compatRichard Alpe
Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: move and rename the legacy nl api to "nl compat"Richard Alpe
The new netlink API is no longer "v2" but rather the standard API and the legacy API is now "nl compat". We split them into separate start/stop and put them in different files in order to further distinguish them. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUGTrond Myklebust
Now that the linger code is gone, the xs_tcp_fin_timeout variable has no real function. Keep it for now, since it is part of the /proc interface, but only define it if that /proc interface is enabled. Suggested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Handle connection reset more efficiently.Trond Myklebust
If the connection reset is due to an active call on our side, then the state change is sometimes not reported. Catch those instances using xs_error_report() instead. Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as the change in behaviour makes it redundant. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flagTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_releaseTrond Myklebust
Use of socket shutdown() means that we monitor the shutdown process through the xs_tcp_state_change() callback, so it is preferable to a full close in all cases unless we're destroying the transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connectionTrond Myklebust
The previous behaviour left the connection half-open in order to try to scrape the last replies from the socket. Now that we have more reliable reconnection, change the behaviour to close down the socket faster. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORTTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Remove TCP socket linger codeTrond Myklebust
Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08net:rfs: adjust table size checkingEric Dumazet
Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09ipvs: fix inability to remove a mixed-family RSAlexey Andriyanov
The current code prevents any operation with a mixed-family dest unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible for the client to follow this rule, because ip_vs_genl_parse_dest does not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST (need_full_dest = 0). Also, not every client can pass this flag when removing a dest. ipvsadm, for example, does not support the "-i" command line option together with the "-d" option. This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command. Signed-off-by: Alexey Andriyanov <alan@al-an.info> Fixes: bc18d37f676f ("ipvs: Allow heterogeneous pools now that we support them") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-08SUNRPC: Remove TCP client connection reset hackTrond Myklebust
Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: TCP/UDP always close the old socket before reconnectingTrond Myklebust
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Add helpers to prevent socket create from racingTrond Myklebust
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Ensure xs_reset_transport() resets the close connection flagsTrond Myklebust
Otherwise, we may end up looping. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Do not clear the source port in xs_reset_transportTrond Myklebust
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Handle EADDRINUSE on connectTrond Myklebust
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08net: rfs: add hash collision detectionEric Dumazet
Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08gre/ipip: use be16 variants of netlink functionsSabrina Dubroca
encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead of nla_{get,put}_u16. Fixes the sparse warnings: warning: incorrect type in assignment (different base types) expected restricted __be32 [addressable] [usertype] o_key got restricted __be16 [addressable] [usertype] i_flags warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] sport got unsigned short warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] dport got unsigned short warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] sport warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] dport Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08SUNRPC: Set SO_REUSEPORT socket option for TCP connectionsTrond Myklebust
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08tipc: fix bug in socket reception functionJon Paul Maloy
In commit c637c1035534867b85b78b453c38c495b58e2c5a ("tipc: resolve race problem at unicast message reception") we introduced a time limit for how long the function tipc_sk_eneque() would be allowed to execute its loop. Unfortunately, the test for when this limit is passed was put in the wrong place, resulting in a lost message when the test is true. We fix this by moving the test to before we dequeue the next buffer from the input queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08rt6_probe_deferred: Do not depend on struct orderingMichael Büsch
rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08Merge tag 'nfs-rdma-for-3.20-part-2' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: RDMA Client Sparse Fixes This patch fixes a sparse warning in the initial submission. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Address sparse complaint in rpcr_to_rdmar()
2015-02-08tcp: mitigate ACK loops for connections as tcp_timewait_sockNeal Cardwell
Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_sockNeal Cardwell
Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_request_sockNeal Cardwell
In the SYN_RECV state, where the TCP connection is represented by tcp_request_sock, we now rate-limit SYNACKs in response to a client's retransmitted SYNs: we do not send a SYNACK in response to client SYN if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a SYNACK in response to a client's retransmitted SYN. This allows the vast majority of legitimate client connections to proceed unimpeded, even for the most aggressive platforms, iOS and MacOS, which actually retransmit SYNs 1-second intervals for several times in a row. They use SYN RTO timeouts following the progression: 1,1,1,1,1,2,4,8,16,32. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacksNeal Cardwell
Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08openvswitch: Initialize unmasked key and uid lenPravin B Shelar
Flow alloc needs to initialize unmasked key pointer. Otherwise it can crash kernel trying to free random unmasked-key pointer. general protection fault: 0000 [#1] SMP 3.19.0-rc6-net-next+ #457 Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008 RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196 Call Trace: [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch] [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch] [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch] [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch] [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95 [<ffffffff81399898>] ? genl_rcv+0x18/0x37 [<ffffffff813998a7>] genl_rcv+0x27/0x37 [<ffffffff81399033>] netlink_unicast+0x103/0x191 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310 [<ffffffff811007ad>] ? might_fault+0x50/0xa0 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a [<ffffffff8135c799>] sock_sendmsg+0xb/0xd [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16 [<ffffffff81411852>] system_call_fastpath+0x12/0x17 Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.") CC: Joe Stringer <joestringer@nicira.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07bridge: add missing bridge port check for offloadsRoopa Prabhu
This patch fixes a missing bridge port check caught by smatch. setlink/dellink of attributes like vlans can come for a bridge device and there is no need to offload those today. So, this patch adds a bridge port check. (In these cases however, the BRIDGE_SELF flags will always be set and we may not hit a problem with the current code). smatch complaint: The patch 68e331c785b8: "bridge: offload bridge port attributes to switch asic if feature flag set" from Jan 29, 2015, leads to the following Smatch complaint: net/bridge/br_netlink.c:552 br_setlink() error: we previously assumed 'p' could be null (see line 518) net/bridge/br_netlink.c 517 518 if (p && protinfo) { ^ Check for NULL. Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07net: use netif_rx_ni() from process contextEric Dumazet
Hotpluging a cpu might be rare, yet we have to use proper handlers when taking over packets found in backlog queues. dev_cpu_callback() runs from process context, thus we should call netif_rx_ni() to properly invoke softirq handler. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07rds: Make rds_message_copy_from_user() return 0 on success.Sowmini Varadhan
Commit 083735f4b01b ("rds: switch rds_message_copy_from_user() to iov_iter") breaks rds_message_copy_from_user() semantics on success, and causes it to return nbytes copied, when it should return 0. This commit fixes that bug. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07net: rds: Remove repeated function names from debug outputRasmus Villemoes
The macro rdsdebug is defined as pr_debug("%s(): " fmt, __func__ , ##args) Hence it doesn't make sense to include the name of the calling function explicitly in the format string passed to rdsdebug. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>