summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2014-03-07Bluetooth: Fix skb allocation check for A2MPGustavo Padovan
vtable's method alloc_skb() needs to return a ERR_PTR in case of err and not a NULL. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-07Bluetooth: Fix expected key count debug logsJohan Hedberg
The debug logs for reporting a discrepancy between the expected amount of keys and the actually received amount of keys got these value mixed up. This patch fixes the issue. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-07netfilter: nft_hash: bug fixes and resizingPatrick McHardy
The hash set type is very broken and was never meant to be merged in this state. Missing RCU synchronization on element removal, leaking chain refcounts when used as a verdict map, races during lookups, a fixed table size are probably just some of the problems. Luckily it is currently never chosen by the kernel when the rbtree type is also available. Rewrite it to be usable. The new implementation supports automatic hash table resizing using RCU, based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing For RCU-Protected Hash Tables" described in [1]. Resizing doesn't require a second list head in the elements, it works by chosing a hash function that remaps elements to a predictable set of buckets, only resizing by integral factors and - during expansion: linking new buckets to the old bucket that contains elements for any of the new buckets, thereby creating imprecise chains, then incrementally seperating the elements until the new buckets only contain elements that hash directly to them. - during shrinking: linking the hash chains of all old buckets that hash to the same new bucket to form a single chain. Expansion requires at most the number of elements in the longest hash chain grace periods, shrinking requires a single grace period. Due to the requirement of having hash chains/elements linked to multiple buckets during resizing, homemade single linked lists are used instead of the existing list helpers, that don't support this in a clean fashion. As a side effect, the amount of memory required per element is reduced by one pointer. Expansion is triggered when the load factors exceeds 75%, shrinking when the load factor goes below 30%. Both operations are allowed to fail and will be retried on the next insertion or removal if their respective conditions still hold. [1] http://dl.acm.org/citation.cfm?id=2002181.2002192 Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: conntrack: remove central spinlock nf_conntrack_lockJesper Dangaard Brouer
nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: conntrack: seperate expect locking from nf_conntrack_lockJesper Dangaard Brouer
Netfilter expectations are protected with the same lock as conntrack entries (nf_conntrack_lock). This patch split out expectations locking to use it's own lock (nf_conntrack_expect_lock). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: avoid race with exp->master ctJesper Dangaard Brouer
Preparation for disconnecting the nf_conntrack_lock from the expectations code. Once the nf_conntrack_lock is lifted, a race condition is exposed. The expectations master conntrack exp->master, can race with delete operations, as the refcnt increment happens too late in init_conntrack(). Race is against other CPUs invoking ->destroy() (destroy_conntrack()), or nf_ct_delete() (via timeout or early_drop()). Avoid this race in nf_ct_find_expectation() by using atomic_inc_not_zero(), and checking if nf_ct_is_dying() (path via nf_ct_delete()). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: conntrack: spinlock per cpu to protect special lists.Jesper Dangaard Brouer
One spinlock per cpu to protect dying/unconfirmed/template special lists. (These lists are now per cpu, a bit like the untracked ct) Add a @cpu field to nf_conn, to make sure we hold the appropriate spinlock at removal time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: trivial code cleanup and doc changesJesper Dangaard Brouer
Changes while reading through the netfilter code. Added hint about how conntrack nf_conn refcnt is accessed. And renamed repl_hash to reply_hash for readability Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-nextPablo Neira Ayuso
Via Simon Horman: ==================== * Whitespace cleanup spotted by checkpatch.pl from Tingwei Liu. * Section conflict cleanup, basically removal of one wrong __read_mostly, from Andi Kleen. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07xfrm: rename struct xfrm_filterNicolas Dichtel
iproute2 already defines a structure with that name, let's use another one to avoid any conflict. CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-07ipvs: Reduce checkpatch noise in ip_vs_lblc.cTingwei Liu
Add whitespace after operator and put open brace { on the previous line Cc: Tingwei Liu <liutingwei@hisense.com> Cc: lvs-devel@vger.kernel.org Signed-off-by: Tingwei Liu <tingw.liu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-03-07sections, ipvs: Remove useless __read_mostly for ipvs genl_opsAndi Kleen
const __read_mostly does not make any sense, because const data is already read-only. Remove the __read_mostly for the ipvs genl_ops. This avoids a LTO section conflict compile problem. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Patrick McHardy <kaber@trash.net> Cc: lvs-devel@vger.kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-03-06ipv6: don't set DST_NOCOUNT for remotely added routesSabrina Dubroca
DST_NOCOUNT should only be used if an authorized user adds routes locally. In case of routes which are added on behalf of router advertisments this flag must not get used as it allows an unlimited number of routes getting added remotely. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06net_sched: htb: do not acquire qdisc lock in dump operationsEric Dumazet
htb_dump() and htb_dump_class() do not strictly need to acquire qdisc lock to fetch qdisc and/or class parameters. We hold RTNL and no changes can occur. This reduces by 50% qdisc lock pressure while doing tc qdisc|class dump operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-066lowpan: move 6lowpan header to include/netAlexander Aring
This header is used by bluetooth and ieee802154 branch. This patch move this header to the include/net directory to avoid a use of a relative path in include. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-066lowpan: add missing include of net/ipv6.hAlexander Aring
The 6lowpan.h file contains some static inline function which use internal ipv6 api structs. Add a include of ipv6.h to be sure that it's known before. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06ipv6: Fix exthdrs offload registration.Anton Nayshtut
Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds). This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS header. The issue detected and the fix verified by running MS HCK Offload LSO test on top of QEMU Windows guests, as this test sends IPv6 packets with IPPROTO_DSTOPTS. Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06net: unix socket code abuses csum_partialAnton Blanchard
The unix socket code is using the result of csum_partial to hash into a lookup table: unix_hash_fold(csum_partial(sunaddr, len, 0)); csum_partial is only guaranteed to produce something that can be folded into a checksum, as its prototype explains: * returns a 32-bit number suitable for feeding into itself * or csum_tcpudp_magic The 32bit value should not be used directly. Depending on the alignment, the ppc64 csum_partial will return different 32bit partial checksums that will fold into the same 16bit checksum. This difference causes the following testcase (courtesy of Gustavo) to sometimes fail: #include <sys/socket.h> #include <stdio.h> int main() { int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0); int i = 1; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4); struct sockaddr addr; addr.sa_family = AF_LOCAL; bind(fd, &addr, 2); listen(fd, 128); struct sockaddr_storage ss; socklen_t sslen = (socklen_t)sizeof(ss); getsockname(fd, (struct sockaddr*)&ss, &sslen); fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0); if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){ perror(NULL); return 1; } printf("OK\n"); return 0; } As suggested by davem, fix this by using csum_fold to fold the partial 32bit checksum into a 16bit checksum before using it. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06inet: frag: make sure forced eviction removes all fragsFlorian Westphal
Quoting Alexander Aring: While fragmentation and unloading of 6lowpan module I got this kernel Oops after few seconds: BUG: unable to handle kernel paging request at f88bbc30 [..] Modules linked in: ipv6 [last unloaded: 6lowpan] Call Trace: [<c012af4c>] ? call_timer_fn+0x54/0xb3 [<c012aef8>] ? process_timeout+0xa/0xa [<c012b66b>] run_timer_softirq+0x140/0x15f Problem is that incomplete frags are still around after unload; when their frag expire timer fires, we get crash. When a netns is removed (also done when unloading module), inet_frag calls the evictor with 'force' argument to purge remaining frags. The evictor loop terminates when accounted memory ('work') drops to 0 or the lru-list becomes empty. However, the mem accounting is done via percpu counters and may not be accurate, i.e. loop may terminate prematurely. Alter evictor to only stop once the lru list is empty when force is requested. Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Reported-by: Alexander Aring <alex.aring@gmail.com> Tested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tcp: Use NET_ADD_STATS instead of NET_ADD_STATS_BH in tcp_event_new_data_sent()David S. Miller
Can be invoked from non-BH context. Based upon a patch by Eric Dumazet. Fixes: f19c29e3e391 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts") Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: don't log disabled tasklet handler errorsErik Hugne
Failure to schedule a TIPC tasklet with tipc_k_signal because the tasklet handler is disabled is not an error. It means TIPC is currently in the process of shutting down. We remove the error logging in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: fix memory leak during module removalErik Hugne
When the TIPC module is removed, the tasklet handler is disabled before all other subsystems. This will cause lingering publications in the name table because the node_down tasklets responsible to clean up publications from an unreachable node will never run. When the name table is shut down, these publications are detected and an error message is logged: tipc: nametbl_stop(): orphaned hash chain detected This is actually a memory leak, introduced with commit 993b858e37b3120ee76d9957a901cca22312ffaa ("tipc: correct the order of stopping services at rmmod") Instead of just logging an error and leaking memory, we free the orphaned entries during nametable shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: drop subscriber connection id invalidationErik Hugne
When a topology server subscriber is disconnected, the associated connection id is set to zero. A check vs zero is then done in the subscription timeout function to see if the subscriber have been shut down. This is unnecessary, because all subscription timers will be cancelled when a subscriber terminates. Setting the connection id to zero is actually harmful because id zero is the identity of the topology server listening socket, and can cause a race that leads to this socket being closed instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: avoid to unnecessary process switch under non-block modeYing Xue
When messages are received via tipc socket under non-block mode, schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is, the process of receiving messages will be scheduled once although timeout value passed to schedule_timeout() is 0. The same issue exists in accept()/wait_for_accept(). To avoid this unnecessary process switch, we only call schedule_timeout() if the timeout value is non-zero. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: fix connection refcount leakYing Xue
When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a connection instance, its reference count value is increased if it's found. But subsequently if it's found that the connection is closed, the work of sending message is not queued into its server send workqueue, and the connection reference count is not decreased. This will cause a reference count leak. To reproduce this problem, an application would need to open and closes topology server connections with high intensity. We fix this by immediately decrementing the connection reference count if a send fails due to the connection being closed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06tipc: allow connection shutdown callback to be invoked in advanceYing Xue
Currently connection shutdown callback function is called when connection instance is released in tipc_conn_kref_release(), and receiving packets and sending packets are running in different threads. Even if connection is closed by the thread of receiving packets, its shutdown callback may not be called immediately as the connection reference count is non-zero at that moment. So, although the connection is shut down by the thread of receiving packets, the thread of sending packets doesn't know it. Before its shutdown callback is invoked to tell the sending thread its connection has been closed, the sending thread may deliver messages by tipc_conn_sendmsg(), this is why the following error information appears: "Sending subscription event failed, no memory" To eliminate it, allow connection shutdown callback function to be called before connection id is removed in tipc_close_conn(), which makes the sending thread know the truth in time that its socket is closed so that it doesn't send message to it. We also remove the "Sending XXX failed..." error reporting for topology and config services. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06l2tp: fix userspace reception on plain L2TP socketsGuillaume Nault
As pppol2tp_recv() never queues up packets to plain L2TP sockets, pppol2tp_recvmsg() never returns data to userspace, thus making the recv*() system calls unusable. Instead of dropping packets when the L2TP socket isn't bound to a PPP channel, this patch adds them to its reception queue. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06l2tp: fix manual sequencing (de)activation in L2TPv2Guillaume Nault
Commit e0d4435f "l2tp: Update PPP-over-L2TP driver to work over L2TPv3" broke the PPPOL2TP_SO_SENDSEQ setsockopt. The L2TP header length was previously computed by pppol2tp_l2t_header_len() before each call to l2tp_xmit_skb(). Now that header length is retrieved from the hdr_len session field, this field must be updated every time the L2TP header format is modified, or l2tp_xmit_skb() won't push the right amount of data for the L2TP header. This patch uses l2tp_session_set_header_len() to adjust hdr_len every time sequencing is (de)activated from userspace (either by the PPPOL2TP_SO_SENDSEQ setsockopt or the L2TP_ATTR_SEND_SEQ netlink attribute). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2014-03-06inet: remove now unused flag DST_NOPEERHannes Frederic Sowa
Commit e688a604807647 ("net: introduce DST_NOPEER dst flag") introduced DST_NOPEER because because of crashes in ipv6_select_ident called from udp6_ufo_fragment. Since commit 916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") we don't call ipv6_select_ident any more from ip6_ufo_append_data, thus this flag lost its purpose and can be removed. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter typesHeiko Carstens
In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that performs proper zero and sign extension convert all 64 bit parameters to their corresponding 32 bit compat counterparts. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06net/compat: convert to COMPAT_SYSCALL_DEFINEHeiko Carstens
Convert all compat system call functions where all parameter types have a size of four or less than four bytes, or are pointer types to COMPAT_SYSCALL_DEFINE. The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper zero and sign extension to 64 bit of all parameters if needed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06netfilter: ipset: add forceadd kernel support for hash set typesJosh Hunt
Adds a new property for hash set types, where if a set is created with the 'forceadd' option and the set becomes full the next addition to the set may succeed and evict a random entry from the set. To keep overhead low eviction is done very simply. It checks to see which bucket the new entry would be added. If the bucket's pos value is non-zero (meaning there's at least one entry in the bucket) it replaces the first entry in the bucket. If pos is zero, then it continues down the normal add process. This property is useful if you have a set for 'ban' lists where it may not matter if you release some entries from the set early. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06netfilter: ipset: move registration message to init from net_initIlia Mirkin
Commit 1785e8f473 ("netfiler: ipset: Add net namespace for ipset") moved the initialization print into net_init, which can get called a lot due to namespaces. Move it back into init, reduce to pr_info. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06netfilter: ipset: add markmask for hash:ip,mark data typeVytas Dauksa
Introduce packet mark mask for hash:ip,mark data type. This allows to set mark bit filter for the ip set. Change-Id: Id8dd9ca7e64477c4f7b022a1d9c1a5b187f1c96e Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06netfilter: ipset: add hash:ip,mark data type to ipsetVytas Dauksa
Introduce packet mark support with new ip,mark hash set. This includes userspace and kernelspace code, hash:ip,mark set tests and man page updates. The intended use of ip,mark set is similar to the ip:port type, but for protocols which don't use a predictable port number. Instead of port number it matches a firewall mark determined by a layer 7 filtering program like opendpi. As well as allowing or blocking traffic it will also be used for accounting packets and bytes sent for each protocol. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06netfilter: ipset: Add hash: fix coccinelle warningsFengguang Wu
net/netfilter/ipset/ip_set_hash_netnet.c:115:8-9: WARNING: return of 0/1 in function 'hash_netnet4_data_list' with return type bool /c/kernel-tests/src/cocci/net/netfilter/ipset/ip_set_hash_netnet.c:338:8-9: WARNING: return of 0/1 in function 'hash_netnet6_data_list' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06netfilter: ipset: Follow manual page behavior for SET target on list:setSergey Popovich
ipset(8) for list:set says: The match will try to find a matching entry in the sets and the target will try to add an entry to the first set to which it can be added. However real behavior is bit differ from described. Consider example: # ipset create test-1-v4 hash:ip family inet # ipset create test-1-v6 hash:ip family inet6 # ipset create test-1 list:set # ipset add test-1 test-1-v4 # ipset add test-1 test-1-v6 # iptables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src # ip6tables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src And then when iptables/ip6tables rule matches packet IPSET target tries to add src from packet to the list:set test-1 where first entry is test-1-v4 and the second one is test-1-v6. For IPv4, as it first entry in test-1 src added to test-1-v4 correctly, but for IPv6 src not added! Placing test-1-v6 to the first element of list:set makes behavior correct for IPv6, but brokes for IPv4. This is due to result, returned from ip_set_add() and ip_set_del() from net/netfilter/ipset/ip_set_core.c when set in list:set equires more parameters than given or address families do not match (which is this case). It seems wrong returning 0 from ip_set_add() and ip_set_del() in this case, as 0 should be returned only when an element successfuly added/deleted to/from the set, contrary to ip_set_test() which returns 0 when no entry exists and >0 when entry found in set. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-05net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunkDaniel Borkmann
While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable"), we noticed that there's a skb memory leakage in the error path. Running the same reproducer as in ec0223ec48a9 and by unconditionally jumping to the error label (to simulate an error condition) in sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about the unfreed chunk->auth_chunk skb clone: Unreferenced object 0xffff8800b8f3a000 (size 256): comm "softirq", pid 0, jiffies 4294769856 (age 110.757s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X......... backtrace: [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210 [<ffffffff81566929>] skb_clone+0x49/0xb0 [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp] [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp] [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp] [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210 [<ffffffff815a64af>] nf_reinject+0xbf/0x180 [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue] [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink] [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0 [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink] [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250 [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0 [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0 [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380 What happens is that commit bbd0d59809f9 clones the skb containing the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case that an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- When we enter sctp_sf_do_5_1D_ce() and before we actually get to the point where we process (and subsequently free) a non-NULL chunk->auth_chunk, we could hit the "goto nomem_init" path from an error condition and thus leave the cloned skb around w/o freeing it. The fix is to centrally free such clones in sctp_chunk_destroy() handler that is invoked from sctp_chunk_free() after all refs have dropped; and also move both kfree_skb(chunk->auth_chunk) there, so that chunk->auth_chunk is either NULL (since sctp_chunkify() allocs new chunks through kmem_cache_zalloc()) or non-NULL with a valid skb pointer. chunk->skb and chunk->auth_chunk are the only skbs in the sctp_chunk structure that need to be handeled. While at it, we should use consume_skb() for both. It is the same as dev_kfree_skb() but more appropriately named as we are not a device but a protocol. Also, this effectively replaces the kfree_skb() from both invocations into consume_skb(). Functions are the same only that kfree_skb() assumes that the frame was being dropped after a failure (e.g. for tools like drop monitor), usage of consume_skb() seems more appropriate in function sctp_chunk_destroy() though. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05bridge: multicast: add sanity check for query source addressesLinus Lüssing
MLD queries are supposed to have an IPv6 link-local source address according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch adds a sanity check to ignore such broken MLD queries. Without this check, such malformed MLD queries can result in a denial of service: The queries are ignored by any MLD listener therefore they will not respond with an MLD report. However, without this patch these malformed MLD queries would enable the snooping part in the bridge code, potentially shutting down the according ports towards these hosts for multicast traffic as the bridge did not learn about these listeners. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05net: fix for a race condition in the inet frag codeNikolay Aleksandrov
I stumbled upon this very serious bug while hunting for another one, it's a very subtle race condition between inet_frag_evictor, inet_frag_intern and the IPv4/6 frag_queue and expire functions (basically the users of inet_frag_kill/inet_frag_put). What happens is that after a fragment has been added to the hash chain but before it's been added to the lru_list (inet_frag_lru_add) in inet_frag_intern, it may get deleted (either by an expired timer if the system load is high or the timer sufficiently low, or by the fraq_queue function for different reasons) before it's added to the lru_list, then after it gets added it's a matter of time for the evictor to get to a piece of memory which has been freed leading to a number of different bugs depending on what's left there. I've been able to trigger this on both IPv4 and IPv6 (which is normal as the frag code is the same), but it's been much more difficult to trigger on IPv4 due to the protocol differences about how fragments are treated. The setup I used to reproduce this is: 2 machines with 4 x 10G bonded in a RR bond, so the same flow can be seen on multiple cards at the same time. Then I used multiple instances of ping/ping6 to generate fragmented packets and flood the machines with them while running other processes to load the attacked machine. *It is very important to have the _same flow_ coming in on multiple CPUs concurrently. Usually the attacked machine would die in less than 30 minutes, if configured properly to have many evictor calls and timeouts it could happen in 10 minutes or so. An important point to make is that any caller (frag_queue or timer) of inet_frag_kill will remove both the timer refcount and the original/guarding refcount thus removing everything that's keeping the frag from being freed at the next inet_frag_put. All of this could happen before the frag was ever added to the LRU list, then it gets added and the evictor uses a freed fragment. An example for IPv6 would be if a fragment is being added and is at the stage of being inserted in the hash after the hash lock is released, but before inet_frag_lru_add executes (or is able to obtain the lru lock) another overlapping fragment for the same flow arrives at a different CPU which finds it in the hash, but since it's overlapping it drops it invoking inet_frag_kill and thus removing all guarding refcounts, and afterwards freeing it by invoking inet_frag_put which removes the last refcount added previously by inet_frag_find, then inet_frag_lru_add gets executed by inet_frag_intern and we have a freed fragment in the lru_list. The fix is simple, just move the lru_add under the hash chain locked region so when a removing function is called it'll have to wait for the fragment to be added to the lru_list, and then it'll remove it (it works because the hash chain removal is done before the lru_list one and there's no window between the two list adds when the frag can get dropped). With this fix applied I couldn't kill the same machine in 24 hours with the same setup. Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of rwlock") CC: Florian Westphal <fw@strlen.de> CC: Jesper Dangaard Brouer <brouer@redhat.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05Bluetooth: Fix removing Long Term KeyClaudio Takahasi
This patch fixes authentication failure on LE link re-connection when BlueZ acts as slave (peripheral). LTK is removed from the internal list after its first use causing PIN or Key missing reply when re-connecting the link. The LE Long Term Key Request event indicates that the master is attempting to encrypt or re-encrypt the link. Pre-condition: BlueZ host paired and running as slave. How to reproduce(master): 1) Establish an ACL LE encrypted link 2) Disconnect the link 3) Try to re-establish the ACL LE encrypted link (fails) > HCI Event: LE Meta Event (0x3e) plen 19 LE Connection Complete (0x01) Status: Success (0x00) Handle: 64 Role: Slave (0x01) ... @ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000 > HCI Event: LE Meta Event (0x3e) plen 13 LE Long Term Key Request (0x05) Handle: 64 Random number: 875be18439d9aa37 Encryption diversifier: 0x76ed < HCI Command: LE Long Term Key Request Reply (0x08|0x001a) plen 18 Handle: 64 Long term key: 2aa531db2fce9f00a0569c7d23d17409 > HCI Event: Command Complete (0x0e) plen 6 LE Long Term Key Request Reply (0x08|0x001a) ncmd 1 Status: Success (0x00) Handle: 64 > HCI Event: Encryption Change (0x08) plen 4 Status: Success (0x00) Handle: 64 Encryption: Enabled with AES-CCM (0x01) ... @ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 3 < HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1 Advertising: Enabled (0x01) > HCI Event: Command Complete (0x0e) plen 4 LE Set Advertise Enable (0x08|0x000a) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 19 LE Connection Complete (0x01) Status: Success (0x00) Handle: 64 Role: Slave (0x01) ... @ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000 > HCI Event: LE Meta Event (0x3e) plen 13 LE Long Term Key Request (0x05) Handle: 64 Random number: 875be18439d9aa37 Encryption diversifier: 0x76ed < HCI Command: LE Long Term Key Request Neg Reply (0x08|0x001b) plen 2 Handle: 64 > HCI Event: Command Complete (0x0e) plen 6 LE Long Term Key Request Neg Reply (0x08|0x001b) ncmd 1 Status: Success (0x00) Handle: 64 > HCI Event: Disconnect Complete (0x05) plen 4 Status: Success (0x00) Handle: 64 Reason: Authentication Failure (0x05) @ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 0 Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org> Cc: stable@vger.kernel.org Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-05mac80211: clear sequence/fragment number in QoS-null framesJohannes Berg
Avoid leaking data by sending uninitialized memory and setting an invalid (non-zero) fragment number (the sequence number is ignored anyway) by setting the seq_ctrl field to zero. Cc: stable@vger.kernel.org Fixes: 3f52b7e328c5 ("mac80211: mesh power save basics") Fixes: ce662b44ce22 ("mac80211: send (QoS) Null if no buffered frames") Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-04ieee802154: fix whitespace issues in KconfigAlexander Aring
This patch fixes some whitespace issues in Kconfig files of IEEE 802.15.4 subsytem. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-04cfg80211: add MPLS and 802.21 classificationSimon Wunderlich
MPLS labels may contain traffic control information, which should be evaluated and used by the wireless subsystem if present. Also check for IEEE 802.21 which is always network control traffic. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c
2014-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in ieee80211_prep_connection(), sta_info leaked on error. From Eytan Lifshitz. 2) Unintentional switch case fallthrough in nft_reject_inet_eval(), from Patrick McHardy. 3) Must check if payload lenth is a power of 2 in nft_payload_select_ops(), from Nikolay Aleksandrov. 4) Fix mis-checksumming in xen-netfront driver, ip_hdr() is not in the correct place when we invoke skb_checksum_setup(). From Wei Liu. 5) TUN driver should not advertise HW vlan offload features in vlan_features. Fix from Fernando Luis Vazquez Cao. 6) IPV6_VTI needs to select NET_IPV_TUNNEL to avoid build errors, fix from Steffen Klassert. 7) Add missing locking in xfrm_migrade_state_find(), we must hold the per-namespace xfrm_state_lock while traversing the lists. Fix from Steffen Klassert. 8) Missing locking in ath9k driver, access to tid->sched must be done under ath_txq_lock(). Fix from Stanislaw Gruszka. 9) Fix two bugs in TCP fastopen. First respect the size argument given to tcp_sendmsg() in the fastopen path, and secondly prevent tcp_send_syn_data() from potentially using order-5 allocations. From Eric Dumazet. 10) Fix handling of default neigh garbage collection params, from Jiri Pirko. 11) Fix cwnd bloat and over-inflation of RTT when transmit segmentation is in use. From Eric Dumazet. 12) Missing initialization of Realtek r8169 driver's statistics seqlocks. Fix from Kyle McMartin. 13) Fix RTNL assertion failures in 802.3ad and AB ARP monitor of bonding driver, from Ding Tianhong. 14) Bonding slave release race can cause divide by zero, fix from Nikolay Aleksandrov. 15) Overzealous return from neigh_periodic_work() causes reachability time to not be computed. Fix from Duain Jiong. 16) Fix regression in ipv6_find_hdr(), it should not return -ENOENT when a specific target is specified and found. From Hans Schillstrom. 17) Fix VLAN tag stripping regression in BNA driver, from Ivan Vecera. 18) Tail loss probe can calculate bogus RTTs due to missing packet marking on retransmit. Fix from Yuchung Cheng. 19) We cannot do skb_dst_drop() in iptunnel_pull_header() because multicast loopback detection in later code paths need access to skb_rtable(). Fix from Xin Long. 20) The macvlan driver regresses in that it propagates lower device offload support disables into itself, causing severe slowdowns when running over a bridge. Provide the software offloads always on macvlan devices to deal with this and the regression is gone. From Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) macvlan: Add support for 'always_on' offload features net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer net: cpsw: fix cpdma rx descriptor leak on down interface be2net: isolate TX workarounds not applicable to Skyhawk-R be2net: Fix skb double free in be_xmit_wrokarounds() failure path be2net: clear promiscuous bits in adapter->flags while disabling promiscuous mode be2net: Fix to reset transparent vlan tagging qlcnic: dcb: a couple off by one bugs tcp: fix bogus RTT on special retransmission hsr: off by one sanity check in hsr_register_frame_in() can: remove CAN FD compatibility for CAN 2.0 sockets can: flexcan: factor out soft reset into seperate funtion can: flexcan: flexcan_remove(): add missing netif_napi_del() can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze can: flexcan: factor out transceiver {en,dis}able into seperate functions can: flexcan: fix transition from and to low power mode in chip_{en,dis}able can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails can: flexcan: fix shutdown: first disable chip, then all interrupts USB AX88179/178A: Support D-Link DUB-1312 ...
2014-03-04af_rxrpc: Keep rxrpc_call pointers in a hashtableTim Smith
Keep track of rxrpc_call structures in a hashtable so they can be found directly from the network parameters which define the call. This allows incoming packets to be routed directly to a call without walking through hierarchy of peer -> transport -> connection -> call and all the spinlocks that that entailed. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
2014-03-03Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.14 stream... For the mac80211 bits, Johannes says: "This time I have a fix to get out of an 'infinite error state' in case regulatory domain updates failed and two fixes for VHT associations: one to not disconnect immediately when the AP uses more bandwidth than the new regdomain would allow after a change due to association country information getting used, and one for an issue in the code where mac80211 doesn't correctly ignore a reserved field and then uses an HT instead of VHT association." For the iwlwifi bits, Emmanuel says: "Johannes fixes a long standing bug in the AMPDU status reporting. Max fixes the listen time which was way too long and causes trouble to several APs." Along with those, Bing Zhao marks the mwifiex_usb driver as _not_ supporting USB autosuspend after a number of problems with that have been reported. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>