summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2013-05-23bridge: Set vlan_features to allow offloads on vlans.Vlad Yasevich
When vlan device is configured on top of the brige, it does not support any offload capabilities because the bridge device does not initiliaze vlan_fatures. Set vlan_fatures to be equivalent to hw_fatures. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23tcp: xps: fix reordering issuesEric Dumazet
commit 3853b5841c01a ("xps: Improvements in TX queue selection") introduced ooo_okay flag, but the condition to set it is slightly wrong. In our traces, we have seen ACK packets being received out of order, and RST packets sent in response. We should test if we have any packets still in host queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-24mac80211: close AP_VLAN interfaces before unregistering allJohannes Berg
Since Eric's commit efe117ab8 ("Speedup ieee80211_remove_interfaces") there's a bug in mac80211 when it unregisters with AP_VLAN interfaces up. If the AP_VLAN interface was registered after the AP it belongs to (which is the typical case) and then we get into this code path, unregister_netdevice_many() will crash because it isn't prepared to deal with interfaces being closed in the middle of it. Exactly this happens though, because we iterate the list, find the AP master this AP_VLAN belongs to and dev_close() the dependent VLANs. After this, unregister_netdevice_many() won't pick up the fact that the AP_VLAN is already down and will do it again, causing a crash. Cc: stable@vger.kernel.org [2.6.33+] Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-23mac80211: assign AP_VLAN hw queues correctlyJohannes Berg
A lot of code in mac80211 assumes that the hw queues are set up correctly for all interfaces (except for monitor) but this isn't true for AP_VLAN interfaces. Fix this by copying the AP master configuration when an AP VLAN is brought up, after this the AP interface can't change its configuration any more and needs to be brought down to change it, which also forces AP_VLAN interfaces down, so just copying in open() is sufficient. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-23cfg80211: fix reporting 64-bit station info tx bytesFelix Fietkau
Copy & paste mistake - STATION_INFO_TX_BYTES64 is the name of the flag, not NL80211_STA_INFO_TX_BYTES64. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-23mac80211: fix queue handling crashJohannes Berg
The code I added in "mac80211: don't start new netdev queues if driver stopped" crashes for monitor and AP VLAN interfaces because while they have a netdev, they don't have queues set up by the driver. To fix the crash, exclude these from queue accounting here and just start their netdev queues unconditionally. For monitor, this is the best we can do, as we can redirect frames there to any other interface and don't know which one that will since it can be different for each frame. For AP VLAN interfaces, we can do better later and actually properly track the queue status. Not doing this is really a separate bug though. Reported-by: Ilan Peer <ilan.peer@intel.com> Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-23cfg80211: check wdev->netdev in connection workJohannes Berg
If a P2P-Device is present and another virtual interface triggers the connection work, the system crash because it tries to check if the P2P-Device's netdev (which doesn't exist) is up. Skip any wdevs that have no netdev to fix this. Cc: stable@vger.kernel.org Reported-by: YanBo <dreamfly281@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-23netfilter: ipt_ULOG: fix non-null terminated string in the nf_log pathChen Gang
If nf_log uses ipt_ULOG as logging output, we can deliver non-null terminated strings to user-space since the maximum length of the prefix that is passed by nf_log is NF_LOG_PREFIXLEN but pm->prefix is 32 bytes long (ULOG_PREFIX_LEN). This is actually happening already from nf_conntrack_tcp if ipt_ULOG is used, since it is passing strings longer than 32 bytes. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23ipvs: use cond_resched_rcu() helper when walking connectionsSimon Horman
This avoids the situation where walking of a large number of connections may prevent scheduling for a long time while also avoiding excessive calls to rcu_read_unlock() and rcu_read_lock(). Note that in the case of !CONFIG_PREEMPT_RCU this will add a call to cond_resched(). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: {ipt,ebt}_ULOG: rise warning on deprecationPablo Neira Ayuso
This target has been superseded by NFLOG. Spot a warning so we prepare removal in a couple of years. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23netfilter: don't panic on error while walking through the init pathPablo Neira Ayuso
Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6Florian Westphal
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23bridge: netfilter: using strlcpy() instead of strncpy()Chen Gang
'name' has already set all zero when it is defined, so not need let strncpy() to pad it again. 'name' is a string, better always let is NUL terminated, so use strlcpy() instead of strncpy(). Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: xt_socket: use IP early demuxEric Dumazet
With IP early demux added in linux-3.6, we perform TCP lookup in IP layer before iptables hooks. We can avoid doing a second lookup in xt_socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: xt_CT: optimize XT_CT_NOTRACKEric Dumazet
The percpu untracked ct are not currently used for XT_CT_NOTRACK. xt_ct_tg_check()/xt_ct_target() provides a single ct. Thats not optimal as the ct->ct_general.use cache line will bounce among cpus. Use the intended [1] thing : xt_ct_target() should select the percpu object. [1] Refs : commit 5bfddbd46a95c97 ("netfilter: nf_conntrack: IPS_UNTRACKED bit") commit b3c5163fe0193a7 ("netfilter: nf_conntrack: per_cpu untracking") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23xfrm: properly handle invalid states as an errorTimo Teräs
The error exit path needs err explicitly set. Otherwise it returns success and the only caller, xfrm_output_resume(), would oops in skb_dst(skb)->ops derefence as skb_dst(skb) is NULL. Bug introduced in commit bb65a9cb (xfrm: removes a superfluous check and add a statistic). Signed-off-by: Timo Teräs <timo.teras@iki.fi> Cc: Li RongQing <roy.qing.li@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23ipv6: use ipv6_addr_scope() helperCong Wang
ipv6_addr_type(&addr)&IPV6_ADDR_SCOPE_MASK could be replaced by ipv6_addr_scope(), which is slightly faster. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23ipv6: use ipv6_addr_any() helperCong Wang
ipv6_addr_any() is a faster way to determine if an addr is ipv6 any addr, no need to compute the addr type. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23tcp: bug fix in proportional rate reduction.Nandita Dukkipati
This patch is a fix for a bug triggering newly_acked_sacked < 0 in tcp_ack(.). The bug is triggered by sacked_out decreasing relative to prior_sacked, but packets_out remaining the same as pior_packets. This is because the snapshot of prior_packets is taken after tcp_sacktag_write_queue() while prior_sacked is captured before tcp_sacktag_write_queue(). The problem is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment) adjusts the pcount for packets_out and sacked_out (MSS change or other reason). As a result, this delta in pcount is reflected in (prior_sacked - sacked_out) but not in (prior_packets - packets_out). This patch does the following: 1) initializes prior_packets at the start of tcp_ack() so as to capture the delta in packets_out created by tcp_fragment. 2) introduces a new "previous_packets_out" variable that snapshots packets_out right before tcp_clean_rtx_queue, so pkts_acked can be correctly computed as before. 3) Computes pkts_acked using previous_packets_out, and computes newly_acked_sacked using prior_packets. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23sch_tbf: segment too big GSO packetsEric Dumazet
If a GSO packet has a length above tbf burst limit, the packet is currently silently dropped. Current way to handle this is to set the device in non GSO/TSO mode, or setting high bursts, and its sub optimal. We can actually segment too big GSO packets, and send individual segments as tbf parameters allow, allowing for better interoperability. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22net: Loosen constraints for recalculating checksum in skb_segment()Simon Horman
This is a generic solution to resolve a specific problem that I have observed. If the encapsulation of an skb changes then ability to offload checksums may also change. In particular it may be necessary to perform checksumming in software. An example of such a case is where a non-GRE packet is received but is to be encapsulated and transmitted as GRE. Another example relates to my proposed support for for packets that are non-MPLS when received but MPLS when transmitted. The cost of this change is that the value of the csum variable may be checked when it previously was not. In the case where the csum variable is true this is pure overhead. In the case where the csum variable is false it leads to software checksumming, which I believe also leads to correct checksums in transmitted packets for the cases described above. Further analysis: This patch relies on the return value of can_checksum_protocol() being correct and in turn the return value of skb_network_protocol(), used to provide the protocol parameter of can_checksum_protocol(), being correct. It also relies on the features passed to skb_segment() and in turn to can_checksum_protocol() being correct. I believe that this problem has not been observed for VLANs because it appears that almost all drivers, the exception being xgbe, set vlan_features such that that the checksum offload support for VLAN packets is greater than or equal to that of non-VLAN packets. I wonder if the code in xgbe may be an oversight and the hardware does support checksumming of VLAN packets. If so it may be worth updating the vlan_features of the driver as this patch will force such checksums to be performed in software rather than hardware. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22bridge: send query as soon as leave is receivedCong Wang
Continue sending queries when leave is received if the user marks it as a querier. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Baker <linux@baker-net.org.uk> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22bridge: only expire the mdb entry when query is receivedCong Wang
Currently we arm the expire timer when the mdb entry is added, however, this causes problem when there is no querier sent out after that. So we should only arm the timer when a corresponding query is received, as suggested by Herbert. And he also mentioned "if there is no querier then group subscriptions shouldn't expire. There has to be at least one querier in the network for this thing to work. Otherwise it just degenerates into a non-snooping switch, which is OK." Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Baker <linux@baker-net.org.uk> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22bridge: use the bridge IP addr as source addr for querierCong Wang
Quote from Adam: "If it is believed that the use of 0.0.0.0 as the IP address is what is causing strange behaviour on other devices then is there a good reason that a bridge rather than a router shouldn't be the active querier? If not then using the bridge IP address and having the querier enabled by default may be a reasonable solution (provided that our querier obeys the election rules and shuts up if it sees a query from a lower IP address that isn't 0.0.0.0). Just because a device is the elected querier for IGMP doesn't appear to mean it is required to perform any other routing functions." And introduce a new troggle for it, as suggested by Herbert. Suggested-by: Adam Baker <linux@baker-net.org.uk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Baker <linux@baker-net.org.uk> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22SUNRPC: Prevent an rpc_task wakeup raceTrond Myklebust
The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to be careful about ordering the calls to rpc_test_and_set_running(task) and rpc_clear_queued(task). If we get the order wrong, then we may end up testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped and changed the state of the rpc_task. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-05-21batman-adv: Avoid double freeing of bat_countersMartin Hundebøll
On errors in batadv_mesh_init(), bat_counters will be freed in both batadv_mesh_free() and batadv_softif_init_late(). This patch fixes this by returning earlier from batadv_softif_init_late() in case of errors in batadv_mesh_init() and by setting bat_counters to NULL after freeing. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-05-21sunrpc: the cache_detail in cache_is_valid is unused any morechaoting fan
The cache_detail(*detail) in function cache_is_valid is not used any more. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-21NFC: Remove commented out LLCP related Makefile linePaul Bolle
The Kconfig symbol NFC_LLCP was removed in commit 30cc458765 ("NFC: Move LLCP code to the NFC top level diirectory"). But the reference to its macro in this Makefile was only commented out. Remove it now. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-05-20tcp: md5: remove spinlock usage in fast pathEric Dumazet
TCP md5 code uses per cpu variables but protects access to them with a shared spinlock, which is a contention point. [ tcp_md5sig_pool_lock is locked twice per incoming packet ] Makes things much simpler, by allocating crypto structures once, first time a socket needs md5 keys, and not deallocating them as they are really small. Next step would be to allow crypto allocations being done in a NUMA aware way. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20rps: selective flow shedding during softnet overflowWillem de Bruijn
A cpu executing the network receive path sheds packets when its input queue grows to netdev_max_backlog. A single high rate flow (such as a spoofed source DoS) can exceed a single cpu processing rate and will degrade throughput of other flows hashed onto the same cpu. This patch adds a more fine grained hashtable. If the netdev backlog is above a threshold, IRQ cpus track the ratio of total traffic of each flow (using 4096 buckets, configurable). The ratio is measured by counting the number of packets per flow over the last 256 packets from the source cpu. Any flow that occupies a large fraction of this (set at 50%) will see packet drop while above the threshold. Tested: Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0, kernel receive (RPS) on cpu0 and application threads on cpus 2--7 each handling 20k req/s. Throughput halves when hit with a 400 kpps antagonist storm. With this patch applied, antagonist overload is dropped and the server processes its complete load. The patch is effective when kernel receive processing is the bottleneck. The above RPS scenario is a extreme, but the same is reached with RFS and sufficient kernel processing (iptables, packet socket tap, ..). Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-05-20ip_gre: fix a possible crash in ipgre_err()Eric Dumazet
Another fix needed in ipgre_err(), as parse_gre_header() might change skb->head. Bug added in commit c54419321455 (GRE: Refactor GRE tunneling code.) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19tcp: remove bad timeout logic in fast recoveryYuchung Cheng
tcp_timeout_skb() was intended to trigger fast recovery on timeout, unfortunately in reality it often causes spurious retransmission storms during fast recovery. The particular sign is a fast retransmit over the highest sacked sequence (SND.FACK). Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion to avoid spurious timeout: when SND.UNA advances the sender re-arms RTO and extends the timeout by icsk_rto. The sender does not offset the time elapsed since the packet at SND.UNA was sent. But if the next (DUP)ACK arrives later than ~RTTVAR and triggers tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet sent before the icsk_rto interval lost, including one that's above the highest sacked sequence. Most likely a large part of scorebard will be marked. If most packets are not lost then the subsequent DUPACKs with new SACK blocks will cause the sender to continue to retransmit packets beyond SND.FACK spuriously. Even if only one packet is lost the sender may falsely retransmit almost the entire window. The situation becomes common in the world of bufferbloat: the RTT continues to grow as the queue builds up but RTTVAR remains small and close to the minimum 200ms. If a data packet is lost and the DUPACK triggered by the next data packet is slightly delayed, then a spurious retransmission storm forms. As the original comment on tcp_timeout_skb() suggests: the usefulness of this feature is questionable. It also wastes cycles walking the sack scoreboard and is actually harmful because of false recovery. It's time to remove this. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20Hoist memcpy_fromiovec/memcpy_toiovec into lib/Rusty Russell
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist that function and revert that commit too. socket.h already includes uio.h, so no callers need updating; trying only broke things fo x86_64 randconfig (thanks Fengguang!). Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-19net: irda: using kzalloc() instead of kmalloc() to avoid strncpy() issue.Chen Gang
'discovery->data.info' length is 22, NICKNAME_MAX_LEN is 21, so the strncpy() will always left the last byte of 'discovery->data.info' uninitialized. When 'text' length is longer than 21 (NICKNAME_MAX_LEN), if still left the last byte of 'discovery->data.info' uninitialized, the next strlen() will cause issue. Also 'discovery->data' is 'struct irda_device_info' which defined in "include/uapi/...", it may copy to user mode, so need whole initialized. All together, need use kzalloc() instead of kmalloc() to initialize all members firstly. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19ipv6: add support of peer addressNicolas Dichtel
This patch adds the support of peer address for IPv6. For example, it is possible to specify the remote end of a 6inY tunnel. This was already possible in IPv4: ip addr add ip1 peer ip2 dev dev1 The peer address is specified with IFA_ADDRESS and the local address with IFA_LOCAL (like explained in include/uapi/linux/if_addr.h). Note that the API is not changed, because before this patch, it was not possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE. There is a small change for the dump: if the peer is different from ::, IFA_ADDRESS will contain the peer address instead of the local address. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19netlabel: improve domain mapping validationPaul Moore
The net/netlabel/netlabel_domainhash.c:netlbl_domhsh_add() function does not properly validate new domain hash entries resulting in potential problems when an administrator attempts to add an invalid entry. One such problem, as reported by Vlad Halilov, is a kernel BUG (found in netlabel_domainhash.c:netlbl_domhsh_audit_add()) when adding an IPv6 outbound mapping with a CIPSO configuration. This patch corrects this problem by adding the necessary validation code to netlbl_domhsh_add() via the newly created netlbl_domhsh_validate() function. Ideally this patch should also be pushed to the currently active -stable trees. Reported-by: Vlad Halilov <vlad.halilov@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-18ipv6: fix possible crashes in ip6_cork_release()Eric Dumazet
commit 0178b695fd6b4 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com>
2013-05-17dev: remove duplicate 'skb->dev = dev' in dev_forward_skb()Nicolas Dichtel
This was added by commit 59b9997baba5 (Revert "net: maintain namespace isolation between vlan and real device"). In fact, before the initial commit - the one that is reverted -, this statement was not present. 'skb->dev = dev' is already done in eth_type_trans(), which is call just after. Spotted-by: Alain Ritoux <alain.ritoux@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-17libceph: must hold mutex for reset_changed_osds()Alex Elder
An osd client has a red-black tree describing its osds, and occasionally we would get crashes due to one of these trees tree becoming corrupt somehow. The problem turned out to be that reset_changed_osds() was being called without protection of the osd client request mutex. That function would call __reset_osd() for any osd that had changed, and __reset_osd() would call __remove_osd() for any osd with no outstanding requests, and finally __remove_osd() would remove the corresponding entry from the red-black tree. Thus, the tree was getting modified without having any lock protection, and was vulnerable to problems due to concurrent updates. This appears to be the only osd tree updating path that has this problem. It can be fairly easily fixed by moving the call up a few lines, to just before the request mutex gets dropped in kick_requests(). This resolves: http://tracker.ceph.com/issues/5043 Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-17mac80211: fix direct probe authStanislaw Gruszka
We send direct probe to broadcast address, as some APs do not respond to unicast PROBE frames when unassociated. Broadcast frames are not acked, so we can not use that for trigger MLME state machine, but we need to use old timeout mechanism. This fixes authentication timed out like below: [ 1024.671974] wlan6: authenticate with 54:e6:fc:98:63:fe [ 1024.694125] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3) [ 1024.695450] wlan6: direct probe to 54:e6:fc:98:63:fe (try 2/3) [ 1024.700586] wlan6: send auth to 54:e6:fc:98:63:fe (try 3/3) [ 1024.701441] wlan6: authentication with 54:e6:fc:98:63:fe timed out With fix, we have: [ 4524.198978] wlan6: authenticate with 54:e6:fc:98:63:fe [ 4524.220692] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3) [ 4524.421784] wlan6: send auth to 54:e6:fc:98:63:fe (try 2/3) [ 4524.423272] wlan6: authenticated [ 4524.423811] wlan6: associate with 54:e6:fc:98:63:fe (try 1/3) [ 4524.427492] wlan6: RX AssocResp from 54:e6:fc:98:63:fe (capab=0x431 status=0 aid=1) Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-17batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT codeLinus Lüssing
rcu_barrier() only waits for the currently scheduled rcu functions to finish - it won't wait for any function scheduled via another call_rcu() within an rcu scheduled function. Unfortunately our batadv_tt_orig_list_entry_free_ref() does just that, via a batadv_orig_node_free_ref() call, leading to our rcu_barrier() call potentially missing such a batadv_orig_node_free_ref(). This patch fixes this issue by calling the batadv_orig_node_free_rcu() directly from the rcu callback, removing the unnecessary, additional call_rcu() layer here. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org>
2013-05-16tcp: speedup tcp_fixup_rcvbuf()Eric Dumazet
tcp_fixup_rcvbuf() contains a loop to estimate initial socket rcv space needed for a given mss. With large MTU (like 64K on lo), we can loop ~500 times and consume a lot of cpu cycles. perf top of 200 concurrent netperf -t TCP_CRR 5.62% netperf [kernel.kallsyms] [k] tcp_init_buffer_space 1.71% netperf [kernel.kallsyms] [k] _raw_spin_lock 1.55% netperf [kernel.kallsyms] [k] kmem_cache_free 1.51% netperf [kernel.kallsyms] [k] tcp_transmit_skb 1.50% netperf [kernel.kallsyms] [k] tcp_ack Lets use a 100% factor, and remove the loop. 100% is needed anyway for tcp_adv_win_scale=1 default value, and is also the maximum factor. Refs: commit b49960a05e32 ("tcp: change tcp_adv_win_scale and tcp_rmem[2]") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-16tcp: gso: do not generate out of order packetsEric Dumazet
GSO TCP handler has following issues : 1) ooo_okay from original GSO packet is duplicated to all segments 2) segments (but the last one) are orphaned, so transmit path can not get transmit queue number from the socket. This happens if GSO segmentation is done before stacked device for example. Result is we can send packets from a given TCP flow to different TX queues (if using multiqueue NICS). This generates OOO problems and spurious SACK & retransmits. Fix this by keeping socket pointer set for all segments. This means that every segment must also have a destructor, and the original gso skb truesize must be split on all segments, to keep precise sk->sk_wmem_alloc accounting. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-16Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains three Netfilter fixes and update for the MAINTAINER file for your net tree, they are: * Fix crash if nf_log_packet is called from conntrack, in that case both interfaces are NULL, from Hans Schillstrom. This bug introduced with the logging netns support in the previous merge window. * Fix compilation of nf_log and nf_queue without CONFIG_PROC_FS, from myself. This bug was introduced in the previous merge window with the new netns support for the netfilter logging infrastructure. * Fix possible crash in xt_TCPOPTSTRIP due to missing sanity checkings to validate that the TCP header is well-formed, from myself. I can find this bug in 2.6.25, probably it's been there since the beginning. I'll pass this to -stable. * Update MAINTAINER file to point to new nf trees at git.kernel.org, remove Harald and use M: instead of P: (now obsolete tag) to keep Jozsef in the list of people. Please, consider pulling this. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-16mac80211: enable power save only if DTIM period is availableAlexander Bondar
Generally, the DTIM period is available after a beacon has been received, and if no beacon has been received enabling powersave is problematic anyway for synchronisation. Since some drivers may require the DTIM period for powersave, don't enable powersave until it becomes available in case the scan/association managed to not receive a beacon. Signed-off-by: Alexander Bondar <alexander.bondar@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16mac80211: enable Auth Protocol Identifier on mesh config.Colleen Twitty
Previously the mesh_auth_id was disabled. Instead set the correct mesh authentication bit based on the mesh setup. Signed-off-by: Colleen Twitty <colleen@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16cfg80211: Userspace may inform kernel of mesh auth method.Colleen Twitty
Authentication takes place in userspace, but the beacon is generated in the kernel. Allow userspace to inform the kernel of the authentication method so the appropriate mesh config IE can be set prior to beacon generation when joining the MBSS. Signed-off-by: Colleen Twitty <colleen@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16cfg80211: use C99 initialisers to simplify code a bitJohannes Berg
Use C99 initialisers for the auth, deauth and disassoc requests to simplify the code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16mac80211: write memcpy differently for smatchJohannes Berg
There's no real difference between *array and array, but the former confuses smatch so write it differently. The generated code is exactly the same. Signed-off-by: Johannes Berg <johannes.berg@intel.com>