summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2013-11-07Merge tag 'driver-core-3.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / sysfs patches from Greg KH: "Here's the big driver core / sysfs update for 3.13-rc1. There's lots of dev_groups updates for different subsystems, as they all get slowly migrated over to the safe versions of the attribute groups (removing userspace races with the creation of the sysfs files.) Also in here are some kobject updates, devres expansions, and the first round of Tejun's sysfs reworking to enable it to be used by other subsystems as a backend for an in-kernel filesystem. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits) sysfs: rename sysfs_assoc_lock and explain what it's about sysfs: use generic_file_llseek() for sysfs_file_operations sysfs: return correct error code on unimplemented mmap() mdio_bus: convert bus code to use dev_groups device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name sysfs: separate out dup filename warning into a separate function sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c sysfs: remove unused sysfs_get_dentry() prototype sysfs: honor bin_attr.attr.ignore_lockdep sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr devres: restore zeroing behavior of devres_alloc() sysfs: fix sysfs_write_file for bin file input: gameport: convert bus code to use dev_groups input: serio: remove bus usage of dev_attrs input: serio: use DEVICE_ATTR_RO() i2o: convert bus code to use dev_groups memstick: convert bus code to use dev_groups tifm: convert bus code to use dev_groups virtio: convert bus code to use dev_groups ipack: convert bus code to use dev_groups ...
2013-11-04net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcbDaniel Borkmann
Introduced in f9e42b853523 ("net: sctp: sideeffect: throw BUG if primary_path is NULL"), we intended to find a buggy assoc that's part of the assoc hash table with a primary_path that is NULL. However, we better remove the BUG_ON for now and find a more suitable place to assert for these things as Mark reports that this also triggers the bug when duplication cookie processing happens, and the assoc is not part of the hash table (so all good in this case). Such a situation can for example easily be reproduced by: tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1 tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20% tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \ protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2 This drops 20% of COOKIE-ACK packets. After some follow-up discussion with Vlad we came to the conclusion that for now we should still better remove this BUG_ON() assertion, and come up with two follow-ups later on, that is, i) find a more suitable place for this assertion, and possibly ii) have a special allocator/initializer for such kind of temporary assocs. Reported-by: Mark Thomas <Mark.Thomas@metaswitch.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02net: flow_dissector: fail on evil iph->ihlJason Wang
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl is evil (less than 5). This issue were introduced by commit ec5efe7946280d1e84603389a1030ccec0a767ae (rps: support IPIP encapsulation). Cc: Eric Dumazet <edumazet@google.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a possible race on ipcomp scratch buffers because of too early enabled siftirqs. From Michal Kubecek. 2) The current xfrm garbage collector threshold is too small for some workloads, resulting in bad performance on these workloads. Increase the threshold from 1024 to 32768. 3) Some codepaths might not have a dst_entry attached to the skb when calling xfrm_decode_session(). So add a check to prevent a null pointer dereference in this case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-01xfrm: Fix null pointer dereference when decoding sessionsSteffen Klassert
On some codepaths the skb does not have a dst entry when xfrm_decode_session() is called. So check for a valid skb_dst() before dereferencing the device interface index. We use 0 as the device index if there is no valid skb_dst(), or at reverse decoding we use skb_iif as device interface index. Bug was introduced with git commit bafd4bd4dc ("xfrm: Decode sessions with output interface."). Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29bridge: pass correct vlan id to multicast codeVlad Yasevich
Currently multicast code attempts to extrace the vlan id from the skb even when vlan filtering is disabled. This can lead to mdb entries being created with the wrong vlan id. Pass the already extracted vlan id to the multicast filtering code to make the correct id is used in creation as well as lookup. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29Merge branch 'fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== One patch for net/3.12 fixing an issue where devices could be in an invalid state they are removed while still attached to OVS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: x25: Fix dead URLs in KconfigMichael Drüing
Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com Signed-off-by: Michael Drüing <michael@drueing.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== This pull request contains the following netfilter fix: * fix --queue-bypass in xt_NFQUEUE revision 3. While adding the revision 3 of this target, the bypass flags were not correctly handled anymore, thus, breaking packet bypassing if no application is listening from userspace, patch from Holger Eitzenberger, reported by Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29netfilter: xt_NFQUEUE: fix --queue-bypass regressionHolger Eitzenberger
V3 of the NFQUEUE target ignores the --queue-bypass flag, causing packets to be dropped when the userspace listener isn't running. Regression is in since 8746ddcf12bb26 ("netfilter: xt_NFQUEUE: introduce CPU fanout"). Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-29tcp: gso: fix truesize trackingEric Dumazet
commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets") had an heuristic that can trigger a warning in skb_try_coalesce(), because skb->truesize of the gso segments were exactly set to mss. This breaks the requirement that skb->truesize >= skb->len + truesizeof(struct sk_buff); It can trivially be reproduced by : ifconfig lo mtu 1500 ethtool -K lo tso off netperf As the skbs are looped into the TCP networking stack, skb_try_coalesce() warns us of these skb under-estimating their truesize. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28xfrm: Increase the garbage collector thresholdSteffen Klassert
With the removal of the routing cache, we lost the option to tweak the garbage collector threshold along with the maximum routing cache size. So git commit 703fb94ec ("xfrm: Fix the gc threshold value for ipv4") moved back to a static threshold. It turned out that the current threshold before we start garbage collecting is much to small for some workloads, so increase it from 1024 to 32768. This means that we start the garbage collector if we have more than 32768 dst entries in the system and refuse new allocations if we are above 65536. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-28pkt_sched: fq: clear time_next_packet for reused flowsEric Dumazet
When a socket is freed/reallocated, we need to clear time_next_packet or else we can inherit a prior value and delay first packets of the new flow. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27tcp: do not rearm RTO when future data are sackedYuchung Cheng
Patch ed08495c3 "tcp: use RTT from SACK for RTO" always re-arms RTO upon obtaining a RTT sample from newly sacked data. But technically RTO should only be re-armed when the data sent before the last (re)transmission of write queue head are (s)acked. Otherwise the RTO may continue to extend during loss recovery on data sent in the future. Note that RTTs from ACK or timestamps do not have this problem, as the RTT source must be from data sent before. The new RTO re-arm policy is 1) Always re-arm RTO if SND.UNA is advanced 2) Re-arm RTO if sack RTT is available, provided the sacked data was sent before the last time write_queue_head was sent. Signed-off-by: Larry Brakmo <brakmo@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27tcp: only take RTT from timestamps if new data is ackedYuchung Cheng
Patch ed08495c3 "tcp: use RTT from SACK for RTO" has a bug that it does not check if the ACK acknowledge new data before taking the RTT sample from TCP timestamps. This patch adds the check back as required by the RFC. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27tcp: fix SYNACK RTT estimation in Fast OpenYuchung Cheng
tp->lsndtime may not always be the SYNACK timestamp if a passive Fast Open socket sends data before handshake completes. And if the remote acknowledges both the data and the SYNACK, the RTT sample is already taken in tcp_ack(), so no need to call tcp_update_ack_rtt() in tcp_synack_rtt_meas() aagain. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25ipv6: ip6_dst_check needs to check for expired dst_entriesHannes Frederic Sowa
On receiving a packet too big icmp error we check if our current cached dst_entry in the socket is still valid. This validation check did not care about the expiration of the (cached) route. The error path I traced down: The socket receives a packet too big mtu notification. It still has a valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry, setting RTF_EXPIRE and updates the dst.expiration value (which could fail because of not up-to-date expiration values, see previous patch). In some seldom cases we race with a) the ip6_fib gc or b) another routing lookup which would result in a recreation of the cached rt6_info from its parent non-cached rt6_info. While copying the rt6_info we reinitialize the metrics store by copying it over from the parent thus invalidating the just installed pmtu update (both dsts use the same key to the inetpeer storage). The dst_entry with the just invalidated metrics data would just get its RTF_EXPIRES flag cleared and would continue to stay valid for the socket. We should have not issued the pmtu update on the already expired dst_entry in the first placed. By checking the expiration on the dst entry and doing a relookup in case it is out of date we close the race because we would install a new rt6_info into the fib before we issue the pmtu update, thus closing this race. Not reliably updating the dst.expire value was fixed by the patch "ipv6: reset dst.expires value when clearing expire flag". Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-by: Valentijn Sessink <valentyn@blub.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Valentijn Sessink <valentyn@blub.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25netpoll: fix rx_hook() interface by passing the skbAntonio Quartulli
Right now skb->data is passed to rx_hook() even if the skb has not been linearised and without giving rx_hook() a way to linearise it. Change the rx_hook() interface and make it accept the skb and the offset to the UDP payload as arguments. rx_hook() is also renamed to rx_skb_hook() to ensure that out of the tree users notice the API change. In this way any rx_skb_hook() implementation can perform all the needed operations to properly (and safely) access the skb data. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23net: sctp: fix ASCONF to allow non SCTP_ADDR_SRC addresses in ipv6Daniel Borkmann
Commit 8a07eb0a50 ("sctp: Add ASCONF operation on the single-homed host") implemented possible use of IPv4 addresses with non SCTP_ADDR_SRC state as source address when sending ASCONF (ADD) packets, but IPv6 part for that was not implemented in 8a07eb0a50. Therefore, as this is not restricted to IPv4-only, fix this up to allow the same for IPv6 addresses in SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-by: Michio Honda <micchie@sfc.wide.ad.jp> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains three netfilter fixes for your net tree, they are: * A couple of fixes to resolve info leak to userspace due to uninitialized memory area in ulogd, from Mathias Krause. * Fix instruction ordering issues that may lead to the access of uninitialized data in x_tables. The problem involves the table update (producer) and the main packet matching (consumer) routines. Detected in SMP ARMv7, from Will Deacon. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22Revert "bridge: only expire the mdb entry when query is received"Linus Lüssing
While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22netfilter: x_tables: fix ordering of jumpstack allocation and table updateWill Deacon
During kernel stability testing on an SMP ARMv7 system, Yalin Wang reported the following panic from the netfilter code: 1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454 [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c) [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104) [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8) [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c) [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158) [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc) [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c) [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50) [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298) [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8) [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30) Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103) ---[ end trace da227214a82491bd ]--- Kernel panic - not syncing: Fatal exception in interrupt This comes about because CPU1 is executing xt_replace_table in response to a setsockopt syscall, resulting in: ret = xt_jumpstack_alloc(newinfo); --> newinfo->jumpstack = kzalloc(size, GFP_KERNEL); [...] table->private = newinfo; newinfo->initial_entries = private->initial_entries; Meanwhile, CPU0 is handling the network receive path and ends up in ipt_do_table, resulting in: private = table->private; [...] jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; On weakly ordered memory architectures, the writes to table->private and newinfo->jumpstack from CPU1 can be observed out of order by CPU0. Furthermore, on architectures which don't respect ordering of address dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered. This patch adds an smp_wmb() before the assignment to table->private (which is essentially publishing newinfo) to ensure that all writes to newinfo will be observed before plugging it into the table structure. A dependent-read barrier is also added on the consumer sides, to ensure the same ordering requirements are also respected there. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com> Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-21tcp: initialize passive-side sk_pacing_rate after 3WHSNeal Cardwell
For passive TCP connections, upon receiving the ACK that completes the 3WHS, make sure we set our pacing rate after we get our first RTT sample. On passive TCP connections, when we receive the ACK completing the 3WHS we do not take an RTT sample in tcp_ack(), but rather in tcp_synack_rtt_meas(). So upon receiving the ACK that completes the 3WHS, tcp_ack() leaves sk_pacing_rate at its initial value. Originally the initial sk_pacing_rate value was 0, so passive-side connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs made in the first RTT. With a default initial cwnd of 10 packets, this happened to be correct for RTTs 5ms or bigger, so it was hard to see problems in WAN or emulated WAN testing. Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the initial sk_pacing_rate is 0xffffffff. So after that change, passive TCP connections were keeping this value (and using large numbers of segments per skbuff) until receiving an ACK for data. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: probe routes asynchronous in rt6_probeHannes Frederic Sowa
Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helperJulian Anastasov
Now when rt6_nexthop() can return nexthop address we can use it for proper nexthop comparison of directly connected destinations. For more information refer to commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper"). Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: fill rt6i_gateway with nexthop addressJulian Anastasov
Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ip_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ip6_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19udp6: respect IPV6_DONTFRAG sockopt in case there are pending framesJiri Pirko
if up->pending != 0 dontfrag is left with default value -1. That causes that application that do: sendto len>mtu flag MSG_MORE sendto len>mtu flag 0 will receive EMSGSIZE errno as the result of the second sendto. This patch fixes it by respecting IPV6_DONTFRAG socket option. introduced by: commit 4b340ae20d0e2366792abe70f46629e576adaf5e "IPv6: Complete IPV6_DONTFRAG support" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix raceDaniel Borkmann
In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e5726 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19Merge 3.12-rc6 into driver-core-nextGreg Kroah-Hartman
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18bridge: Fix updating FDB entries when the PVID is appliedToshiaki Makita
We currently set the value that variable vid is pointing, which will be used in FDB later, to 0 at br_allowed_ingress() when we receive untagged or priority-tagged frames, even though the PVID is valid. This leads to FDB updates in such a wrong way that they are learned with VID 0. Update the value to that of PVID if the PVID is applied. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Fix the way the PVID is referencedToshiaki Makita
We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Apply the PVID to priority-tagged framesToshiaki Makita
IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames use the PVID for the port as its VID. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Apply the PVID to not only untagged frames but also priority-tagged frames. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Don't use VID 0 and 4095 in vlan filteringToshiaki Makita
IEEE 802.1Q says that: - VID 0 shall not be configured as a PVID, or configured in any Filtering Database entry. - VID 4095 shall not be configured as a PVID, or transmitted in a tag header. This VID value may be used to indicate a wildcard match for the VID in management operations or Filtering Database entries. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Don't accept adding these VIDs in the vlan_filtering implementation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18xfrm: prevent ipcomp scratch buffer race conditionMichal Kubecek
In ipcomp_compress(), sortirq is enabled too early, allowing the per-cpu scratch buffer to be rewritten by ipcomp_decompress() (called on the same CPU in softirq context) between populating the buffer and copying the compressed data to the skb. v2: as pointed out by Steffen Klassert, if we also move the local_bh_disable() before reading the per-cpu pointers, we can get rid of get_cpu()/put_cpu(). v3: removed ipcomp_decompress part (as explained by Herbert Xu, it cannot be called from process context), get rid of cpu variable (thanks to Eric Dumazet) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-17bridge: Correctly clamp MAX forward_delay when enabling STPVlad Yasevich
Commit be4f154d5ef0ca147ab6bcd38857a774133f5450 bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()Eric Dumazet
sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO packets in the first place. (As a performance hint) Once we have GSO packets in write queue, we can not decide they are no longer GSO only because flow now uses a route which doesn't handle TSO/GSO. Core networking stack handles the case very well for us, all we need is keeping track of packet counts in MSS terms, regardless of segmentation done later (in GSO or hardware) Right now, if tcp_fragment() splits a GSO packet in two parts, @left and @right, and route changed through a non GSO device, both @left and @right have pcount set to 1, which is wrong, and leads to incorrect packet_count tracking. This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU discovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: must unclone packets before mangling themEric Dumazet
TCP stack should make sure it owns skbs before mangling them. We had various crashes using bnx2x, and it turned out gso_size was cleared right before bnx2x driver was populating TC descriptor of the _previous_ packet send. TCP stack can sometime retransmit packets that are still in Qdisc. Of course we could make bnx2x driver more robust (using ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack. We have identified two points where skb_unclone() was needed. This patch adds a WARN_ON_ONCE() to warn us if we missed another fix of this kind. Kudos to Neal for finding the root cause of this bug. Its visible using small MSS. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.12 stream! For the mac80211 bits, Johannes says: "Jouni fixes a remain-on-channel vs. scan bug, and Felix fixes client TX probing on VLANs." And also: "This time I have two fixes from Emmanuel for RF-kill issues, and fixed two issues reported by Evan Huus and Thomas Lindroth respectively." On top of those... Avinash Patil adds a couple of mwifiex fixes to properly inform cfg80211 about some different types of disconnects, avoiding WARNINGs. Mark Cave-Ayland corrects a pointer arithmetic problem in rtlwifi, avoiding incorrect automatic gain calculations. Solomon Peachy sends a cw1200 fix for locking around calls to cw1200_irq_handler, addressing "lost interrupt" problems. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: fix incorrect ca_state in tail loss probeYuchung Cheng
On receiving an ACK that covers the loss probe sequence, TLP immediately sets the congestion state to Open, even though some packets are not recovered and retransmisssion are on the way. The later ACks may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g., https://bugzilla.redhat.com/show_bug.cgi?id=989251 The fix is to follow the similar procedure in recovery by calling tcp_try_keep_open(). The sender switches to Open state if no packets are retransmissted. Otherwise it goes to Disorder and let subsequent ACKs move the state to Recovery or Open. Reported-By: Michael Sterrett <michael@sterretts.net> Tested-By: Dormando <dormando@rydia.net> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17sctp: Perform software checksum if packet has to be fragmented.Vlad Yasevich
IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum. This causes problems if SCTP packets has to be fragmented and ipsummed has been set to PARTIAL due to checksum offload support. This condition can happen when retransmitting after MTU discover, or when INIT or other control chunks are larger then MTU. Check for the rare fragmentation condition in SCTP and use software checksum calculation in this case. CC: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17sctp: Use software crc32 checksum when xfrm transform will happen.Fan Du
igb/ixgbe have hardware sctp checksum support, when this feature is enabled and also IPsec is armed to protect sctp traffic, ugly things happened as xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing up and pack the 16bits result in the checksum field). The result is fail establishment of sctp communication. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-16openvswitch: fix vport-netdev unregisterAlexei Starovoitov
The combination of two commits: commit 8e4e1713e4 ("openvswitch: Simplify datapath locking.") commit 2537b4dd0a ("openvswitch:: link upper device for port devices") introduced a bug where upper_dev wasn't unlinked upon netdev_unregister notification The following steps: modprobe openvswitch ovs-dpctl add-dp test ip tuntap add dev tap1 mode tap ovs-dpctl add-if test tap1 ip tuntap del dev tap1 mode tap are causing multiple warnings: [ 62.747557] gre: GRE over IPv4 demultiplexor driver [ 62.749579] openvswitch: Open vSwitch switching datapath [ 62.755087] device test entered promiscuous mode [ 62.765911] device tap1 entered promiscuous mode [ 62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready [ 62.769017] ------------[ cut here ]------------ [ 62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240() [ 62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60 [ 62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769053] 0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006 [ 62.769055] 0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58 [ 62.769057] ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8 [ 62.769059] Call Trace: [ 62.769062] [<ffffffff8175e575>] dump_stack+0x55/0x76 [ 62.769065] [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0 [ 62.769067] [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20 [ 62.769069] [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240 [ 62.769071] [<ffffffff8162a101>] rollback_registered+0x31/0x40 [ 62.769073] [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90 [ 62.769075] [<ffffffff8154f900>] __tun_detach+0x140/0x340 [ 62.769077] [<ffffffff8154fb36>] tun_chr_close+0x36/0x60 [ 62.769080] [<ffffffff811bddaf>] __fput+0xff/0x260 [ 62.769082] [<ffffffff811bdf5e>] ____fput+0xe/0x10 [ 62.769084] [<ffffffff8107b515>] task_work_run+0xb5/0xe0 [ 62.769087] [<ffffffff810029b9>] do_notify_resume+0x59/0x80 [ 62.769089] [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 62.769091] [<ffffffff81770f5a>] int_signal+0x12/0x17 [ 62.769093] ---[ end trace 838756c62e156ffb ]--- [ 62.769481] ------------[ cut here ]------------ [ 62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0() [ 62.769486] sysfs: can not remove 'master', no directory [ 62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G W 3.12.0-rc3+ #60 [ 62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch] [ 62.769519] 0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006 [ 62.769521] ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28 [ 62.769523] 0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500 [ 62.769525] Call Trace: [ 62.769528] [<ffffffff8175e575>] dump_stack+0x55/0x76 [ 62.769529] [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0 [ 62.769531] [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50 [ 62.769533] [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0 [ 62.769535] [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30 [ 62.769538] [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150 [ 62.769540] [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50 [ 62.769542] [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50 [ 62.769544] [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140 [ 62.769548] [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch] [ 62.769550] [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch] [ 62.769552] [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch] [ 62.769555] [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch] [ 62.769557] [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0 [ 62.769559] [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0 [ 62.769562] [<ffffffff8107659b>] worker_thread+0x11b/0x370 [ 62.769564] [<ffffffff81076480>] ? rescuer_thread+0x350/0x350 [ 62.769566] [<ffffffff8107f44a>] kthread+0xea/0xf0 [ 62.769568] [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150 [ 62.769570] [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0 [ 62.769572] [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150 [ 62.769573] ---[ end trace 838756c62e156ffc ]--- [ 62.769574] ------------[ cut here ]------------ [ 62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0() [ 62.769577] sysfs: can not remove 'upper_test', no directory [ 62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G W 3.12.0-rc3+ #60 [ 62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch] [ 62.769607] 0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006 [ 62.769609] ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58 [ 62.769611] 0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500 [ 62.769613] Call Trace: [ 62.769615] [<ffffffff8175e575>] dump_stack+0x55/0x76 [ 62.769617] [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0 [ 62.769619] [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50 [ 62.769621] [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0 [ 62.769622] [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30 [ 62.769624] [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150 [ 62.769627] [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50 [ 62.769629] [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50 [ 62.769631] [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140 [ 62.769633] [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch] [ 62.769636] [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch] [ 62.769638] [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch] [ 62.769640] [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch] [ 62.769642] [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0 [ 62.769644] [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0 [ 62.769646] [<ffffffff8107659b>] worker_thread+0x11b/0x370 [ 62.769648] [<ffffffff81076480>] ? rescuer_thread+0x350/0x350 [ 62.769650] [<ffffffff8107f44a>] kthread+0xea/0xf0 [ 62.769652] [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150 [ 62.769654] [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0 [ 62.769656] [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150 [ 62.769657] ---[ end trace 838756c62e156ffd ]--- [ 62.769724] device tap1 left promiscuous mode This patch also affects moving devices between net namespaces. OVS used to ignore netns move notifications which caused problems. Like: ovs-dpctl add-if test tap1 ip link set tap1 netns 3512 and then removing tap1 inside the namespace will cause hang on missing dev_put. With this patch OVS will detach dev upon receiving netns move event. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-10-15Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-10-14Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-10-14mac80211: fix crash if bitrate calculation goes wrongJohannes Berg
If a frame's timestamp is calculated, and the bitrate calculation goes wrong and returns zero, the system will attempt to divide by zero and crash. Catch this case and print the rate information that the driver reported when this happens. Cc: stable@vger.kernel.org Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-14wireless: radiotap: fix parsing buffer overrunJohannes Berg
When parsing an invalid radiotap header, the parser can overrun the buffer that is passed in because it doesn't correctly check 1) the minimum radiotap header size 2) the space for extended bitmaps The first issue doesn't affect any in-kernel user as they all check the minimum size before calling the radiotap function. The second issue could potentially affect the kernel if an skb is passed in that consists only of the radiotap header with a lot of extended bitmaps that extend past the SKB. In that case a read-only buffer overrun by at most 4 bytes is possible. Fix this by adding the appropriate checks to the parser. Cc: stable@vger.kernel.org Reported-by: Evan Huus <eapache@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-11ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is foundOussama Ghorbel
The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the outgoing interface, so it would be better to initialize it even when no route is found, otherwise its value will be zero. While I'm not sure if this could happen in real life, but doing that will avoid to call the skb_push function with a zero in ip6gre_header function. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11netem: free skb's in tree on resetstephen hemminger
Netem can leak memory because packets get stored in red-black tree and it is not cleared on reset. Reported by: Сергеев Сергей <adron@yapic.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>