summaryrefslogtreecommitdiffstats
path: root/include/net
AgeCommit message (Collapse)Author
2013-03-17tcp: Remove TCPCTChristoph Paasch
TCPCT uses option-number 253, reserved for experimental use and should not be used in production environments. Further, TCPCT does not fully implement RFC 6013. As a nice side-effect, removing TCPCT increases TCP's performance for very short flows: Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests for files of 1KB size. before this patch: average (among 7 runs) of 20845.5 Requests/Second after: average (among 7 runs) of 21403.6 Requests/Second Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17caif: remove caif_shmErwan Yvin
caif_shm is an old implementation caif_shm will be replaced by caif_virtio [ As explained by Linus Walleij: "U5500 used this, but was cancelled and the silicon did not reach anyone outside ST-Ericsson. Then for the next platforms, we have gone for the leaner & cleaner approach of using virtio, rpmesg and rproc." ] Signed-off-by: Erwan Yvin <erwan.yvin@stericsson.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Sjur Brendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12tcp: Tail loss probe (TLP)Nandita Dukkipati
This patch series implement the Tail loss probe (TLP) algorithm described in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The first patch implements the basic algorithm. TLP's goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly RTO. In the absence of loss, there is no change in the connection state. PTO stands for probe timeout. It is a timer event indicating that an ACK is overdue and triggers a loss probe packet. The PTO value is set to max(2*SRTT, 10ms) and is adjusted to account for delayed ACK timer when there is only one oustanding packet. TLP Algorithm On transmission of new data in Open state: -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms). -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms) -> PTO = min(PTO, RTO) Conditions for scheduling PTO: -> Connection is in Open state. -> Connection is either cwnd limited or no new data to send. -> Number of probes per tail loss episode is limited to one. -> Connection is SACK enabled. When PTO fires: new_segment_exists: -> transmit new segment. -> packets_out++. cwnd remains same. no_new_packet: -> retransmit the last segment. Its ACK triggers FACK or early retransmit based recovery. ACK path: -> rearm RTO at start of ACK processing. -> reschedule PTO if need be. In addition, the patch includes a small variation to the Early Retransmit (ER) algorithm, such that ER and TLP together can in principle recover any N-degree of tail loss through fast recovery. TLP is controlled by the same sysctl as ER, tcp_early_retrans sysctl. tcp_early_retrans==0; disables TLP and ER. ==1; enables RFC5827 ER. ==2; delayed ER. ==3; TLP and delayed ER. [DEFAULT] ==4; TLP only. The TLP patch series have been extensively tested on Google Web servers. It is most effective for short Web trasactions, where it reduced RTOs by 15% and improved HTTP response time (average by 6%, 99th percentile by 10%). The transmitted probes account for <0.5% of the overall transmissions. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10ipv6: introduce ip6tunnel_xmit() helperCong Wang
Similar to iptunnel_xmit(), group these operations into a helper function. This by the way fixes the missing u64_stats_update_begin() and u64_stats_update_end() for 32 bit arch. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10tunnel: use iptunnel_xmit() againCong Wang
With recent patches from Pravin, most tunnels can't use iptunnel_xmit() any more, due to ip_select_ident() and skb->ip_summed. But we can just move these operations out of iptunnel_xmit(), so that tunnels can use it again. This by the way fixes a bug in vxlan (missing nf_reset()) for net-next. Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08ipv6: introdcue __ipv6_addr_needs_scope_id and ipv6_iface_scope_id helper ↵Hannes Frederic Sowa
functions __ipv6_addr_needs_scope_id checks if an ipv6 address needs to supply a 'sin6_scope_id != 0'. 'sin6_scope_id != 0' was enforced in case of link-local addresses. To support interface-local multicast these checks had to be enhanced and are now consolidated into these new helper functions. v2: a) migrated to struct ipv6_addr_props v3: a) reverted changes for ipv6_addr_props b) test for address type instead of comparing scope v4: a) unchanged Suggested-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07ipv6 flowlabel: add __rcu annotationsEric Dumazet
Commit 18367681a10b (ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.) omitted proper __rcu annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07tcp: uninline tcp_prequeue()Eric Dumazet
tcp_prequeue() became too big to be inlined. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A moderately sized pile of fixes, some specifically for merge window introduced regressions although others are for longer standing items and have been queued up for -stable. I'm kind of tired of all the RDS protocol bugs over the years, to be honest, it's way out of proportion to the number of people who actually use it. 1) Fix missing range initialization in netfilter IPSET, from Jozsef Kadlecsik. 2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes Berg. 3) Fix DMA syncing in SFC driver, from Ben Hutchings. 4) Fix regression in BOND device MAC address setting, from Jiri Pirko. 5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko. 6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips, fix from Dmitry Kravkov. 7) Missing cfgspace_lock initialization in BCMA driver. 8) Validate parameter size for SCTP assoc stats getsockopt(), from Guenter Roeck. 9) Fix SCTP association hangs, from Lee A Roberts. 10) Fix jumbo frame handling in r8169, from Francois Romieu. 11) Fix phy_device memory leak, from Petr Malat. 12) Omit trailing FCS from frames received in BGMAC driver, from Hauke Mehrtens. 13) Missing socket refcount release in L2TP, from Guillaume Nault. 14) sctp_endpoint_init should respect passed in gfp_t, rather than use GFP_KERNEL unconditionally. From Dan Carpenter. 15) Add AISX AX88179 USB driver, from Freddy Xin. 16) Remove MAINTAINERS entries for drivers deleted during the merge window, from Cesar Eduardo Barros. 17) RDS protocol can try to allocate huge amounts of memory, check that the user's request length makes sense, from Cong Wang. 18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own, bogus, definition. From Cong Wang. 19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll, from Frank Li. Also, fix a build error introduced in the merge window. 20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti. 21) Don't double count RTT measurements when we leave the TCP receive fast path, from Neal Cardwell." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) tcp: fix double-counted receiver RTT when leaving receiver fast path CAIF: fix sparse warning for caif_usb rds: simplify a warning message net: fec: fix build error in no MXC platform net: ipv6: Don't purge default router if accept_ra=2 net: fec: put tx to napi poll function to fix dead lock sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE rds: limit the size allocated by rds_message_alloc() MAINTAINERS: remove eexpress MAINTAINERS: remove drivers/net/wan/cycx* MAINTAINERS: remove 3c505 caif_dev: fix sparse warnings for caif_flow_cb ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver sctp: use the passed in gfp flags instead GFP_KERNEL ipv[4|6]: correct dropwatch false positive in local_deliver_finish l2tp: Restore socket refcount when sendmsg succeeds net/phy: micrel: Disable asymmetric pause for KSZ9021 bgmac: omit the fcs phy: Fix phy_device_free memory leak bnx2x: Fix KR2 work-around condition ...
2013-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more VFS bits from Al Viro: "Unfortunately, it looks like xattr series will have to wait until the next cycle ;-/ This pile contains 9p cleanups and fixes (races in v9fs_fid_add() etc), fixup for nommu breakage in shmem.c, several cleanups and a bit more file_inode() work" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify path_get/path_put and fs_struct.c stuff fix nommu breakage in shmem.c cache the value of file_inode() in struct file 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry 9p: make sure ->lookup() adds fid to the right dentry 9p: untangle ->lookup() a bit 9p: double iput() in ->lookup() if d_materialise_unique() fails 9p: v9fs_fid_add() can't fail now v9fs: get rid of v9fs_dentry 9p: turn fid->dlist into hlist 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine more file_inode() open-coded instances selinux: opened file can't have NULL or negative ->f_path.dentry (In the meantime, the hlist traversal macros have changed, so this required a semantic conflict fixup for the newly hlistified fid->dlist)
2013-02-28tcp: avoid wakeups for pure ACKEric Dumazet
TCP prequeue mechanism purpose is to let incoming packets being processed by the thread currently blocked in tcp_recvmsg(), instead of behalf of the softirq handler, to better adapt flow control on receiver host capacity to schedule the consumer. But in typical request/answer workloads, we send request, then block to receive the answer. And before the actual answer, TCP stack receives the ACK packets acknowledging the request. Processing pure ACK on behalf of the thread blocked in tcp_recvmsg() is a waste of resources, as thread has to immediately sleep again because it got no payload. This patch avoids the extra context switches and scheduler overhead. Before patch : a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k 231676 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k': 116251.501765 task-clock # 11.369 CPUs utilized 5,025,463 context-switches # 0.043 M/sec 1,074,511 CPU-migrations # 0.009 M/sec 216,923 page-faults # 0.002 M/sec 311,636,972,396 cycles # 2.681 GHz 260,507,138,069 stalled-cycles-frontend # 83.59% frontend cycles idle 155,590,092,840 stalled-cycles-backend # 49.93% backend cycles idle 100,101,255,411 instructions # 0.32 insns per cycle # 2.60 stalled cycles per insn 16,535,930,999 branches # 142.243 M/sec 646,483,591 branch-misses # 3.91% of all branches 10.225482774 seconds time elapsed After patch : a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k 233297 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k': 91084.870855 task-clock # 8.887 CPUs utilized 2,485,916 context-switches # 0.027 M/sec 815,520 CPU-migrations # 0.009 M/sec 216,932 page-faults # 0.002 M/sec 245,195,022,629 cycles # 2.692 GHz 202,635,777,041 stalled-cycles-frontend # 82.64% frontend cycles idle 124,280,372,407 stalled-cycles-backend # 50.69% backend cycles idle 83,457,289,618 instructions # 0.34 insns per cycle # 2.43 stalled cycles per insn 13,431,472,361 branches # 147.461 M/sec 504,470,665 branch-misses # 3.76% of all branches 10.249594448 seconds time elapsed Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-279p: turn fid->dlist into hlistAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) ping_err() ICMP error handler looks at wrong ICMP header, from Li Wei. 2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet. 3) netif_set_xps_queue() forgets to drop mutex on errors, fix from Alexander Duyck. 4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix from Eric Dumazet. 5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the amount of TCP option space was not taken into account properly in this code path. Fix from yuchung Cheng. 6) MLX4 driver allocates device queues with the wrong size, from Kleber Sacilotto. 7) sock_diag can access past the end of the sock_diag_handlers[] array, from Mathias Krause. 8) vlan_set_encap_proto() makes incorrect assumptions about where skb->data points, rework the logic so that it works regardless of where skb->data happens to be. From Jesse Gross. 9) Fix gianfar build failure with NET_POLL enabled, from Paul Gortmaker. 10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from Pravin B Shelar. 11) bgmac driver does: int i; for (i = 0; ...; ...) { ... for (i = 0; ...; ...) { effectively corrupting the outer loop index, use a seperate variable for the inner loops. From Rafał Miłecki. 12) Fix suspend bugs in smsc95xx driver, from Ming Lei. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND usbnet: smsc95xx: fix broken runtime suspend usbnet: smsc95xx: fix suspend failure bgmac: fix indexing of 2nd level loops b43: Fix lockdep splat on module unload Revert "ip_gre: propogate target device GSO capability to the tunnel device" IP_GRE: Fix GRE_CSUM case. VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification. IP_GRE: Fix IP-Identification. net/pasemi: Fix missing coding style vmxnet3: fix ethtool ring buffer size setting vmxnet3: make local function static bnx2x: remove dead code and make local funcs static gianfar: fix compile fail for NET_POLL=y due to struct packing vlan: adjust vlan_set_encap_proto() for its callers sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg sock_diag: Fix out-of-bounds access to sock_diag_handlers[] vxlan: remove depends on CONFIG_EXPERIMENTAL mlx4_en: fix allocation of CPU affinity reverse-map mlx4_en: fix allocation of device tx_cq ...
2013-02-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-25IP_GRE: Fix IP-Identification.Pravin B Shelar
GRE-GSO generates ip fragments with id 0,2,3,4... for every GSO packet, which is not correct. Following patch fixes it by setting ip-header id unique id of fragments are allowed. As Eric Dumazet suggested it is optimized by using inner ip-header whenever inner packet is ipv4. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22net: fix possible deadlock in sum_frag_mem_limitEric Dumazet
Dave Jones reported a lockdep splat occurring in IP defrag code. commit 6d7b857d541ecd1d (net: use lib/percpu_counter API for fragmentation mem accounting) added a possible deadlock. Because percpu_counter_sum_positive() needs to acquire a lock that can be used from softirq, we need to disable BH in sum_frag_mem_limit() Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22ipv4: fix error handling in icmp_protocol.Li Wei
Now we handle icmp errors in each transport protocol's err_handler, for icmp protocols, that is ping_err. Since this handler only care of those icmp errors triggered by echo request, errors triggered by echo reply(which sent by kernel) are sliently ignored. So wrap ping_err() with icmp_err() to deal with those icmp errors. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...
2013-02-21ipv6: use a stronger hash for tcpEric Dumazet
It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20ipv6: fix race condition regarding dst->expires and dst->from.YOSHIFUJI Hideaki / 吉藤英明
Eric Dumazet wrote: | Some strange crashes happen in rt6_check_expired(), with access | to random addresses. | | At first glance, it looks like the RTF_EXPIRES and | stuff added in commit 1716a96101c49186b | (ipv6: fix problem with expired dst cache) | are racy : same dst could be manipulated at the same time | on different cpus. | | At some point, our stack believes rt->dst.from contains a dst pointer, | while its really a jiffie value (as rt->dst.expires shares the same area | of memory) | | rt6_update_expires() should be fixed, or am I missing something ? | | CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060 Because we do not have any locks for dst_entry, we cannot change essential structure in the entry; e.g., we cannot change reference to other entity. To fix this issue, split 'from' and 'expires' field in dst_entry out of union. Once it is 'from' is assigned in the constructor, keep the reference until the very last stage of the life time of the object. Of course, it is unsafe to change 'from', so make rt6_set_from simple just for fresh entries. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Neil Horman <nhorman@tuxdriver.com> CC: Gao Feng <gaofeng@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reported-by: Steinar H. Gunderson <sesse@google.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contain updates for your net-next tree, they are: * Fix (for just added) connlabel dependencies, from Florian Westphal. * Add aliasing support for conntrack, thus users can either use -m state or -m conntrack from iptables while using the same kernel module, from Jozsef Kadlecsik. * Some code refactoring for the CT target to merge common code in revision 0 and 1, from myself. * Add aliasing support for CT, based on patch from Jozsef Kadlecsik. * Add one mutex per nfnetlink subsystem, from myself. * Improved logging for packets that are dropped by helpers, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into netDavid S. Miller
Pull in 'net' to take in the bug fixes that didn't make it into 3.8-final. Also, deal with the semantic conflict of the change made to net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release was added to 'net', but in 'net-next' we no longer cache the neighbour entries in the ipv6 routes so that change is not appropriate there. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19netfilter: nf_ct_helper: better logging for dropped packetsPablo Neira Ayuso
Connection tracking helpers have to drop packets under exceptional situations. Currently, the user gets the following logging message in case that happens: nf_ct_%s: dropping packet ... However, depending on the helper, there are different reasons why a packet can be dropped. This patch modifies the existing code to provide more specific error message in the scope of each helper to help users to debug the reason why the packet has been dropped, ie: nf_ct_%s: dropping packet: reason ... Thanks to Joe Perches for many formatting suggestions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-18Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/dvm/tx.c drivers/net/wireless/ti/wlcore/sdio.c drivers/net/wireless/ti/wlcore/spi.c
2013-02-18net: fix a compile error when SOCK_REFCNT_DEBUG is enabledYing Xue
When SOCK_REFCNT_DEBUG is enabled, below build error is met: kernel/sysctl_binary.o: In function `sk_refcnt_debug_release': include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release' kernel/sysctl.o:include/net/sock.h:1025: first defined here kernel/audit.o: In function `sk_refcnt_debug_release': include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release' kernel/sysctl.o:include/net/sock.h:1025: first defined here make[1]: *** [kernel/built-in.o] Error 1 make: *** [kernel] Error 2 So we decide to make sk_refcnt_debug_release static to eliminate the error. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-15cfg80211: Pass station (extended) capability info to kernelJouni Malinen
The information of the peer's capabilities and extended capabilities are required for the driver to perform TDLS Peer UAPSD operations and off channel operations. This information of the peer is passed from user space using NL80211_CMD_SET_STATION command. This commit enhances the function nl80211_set_station to pass the capability information of the peer to the driver. Similarly, there may be need for capability information for other modes, so allow this to be provided with both add_station and change_station. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15cfg80211: advertise extended capabilities to userspaceJohannes Berg
In many cases, userspace may need to know which of the 802.11 extended capabilities ("Extended Capabilities element") are implemented in the driver or device, to include them e.g. in beacons, assoc request/response or other frames. Add a new nl80211 attribute to hold the extended capabilities bitmap for this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: stop modifying HT SMPS capabilityJohannes Berg
Instead of modifying the HT SMPS capability field for stations, track the SMPS mode explicitly in a new field in the station struct and use it in the drivers that care about it. This simplifies the code using it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15cfg80211: allow drivers to selectively disable 80/160 MHzJohannes Berg
Some drivers might support 80 or 160 MHz only on some channels for whatever reason, so allow them to disable these channel widths. Also maintain the new flags when regulatory bandwidth limitations would disable these wide channels. Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: add ieee80211_vif_change_bandwidthJohannes Berg
For HT and VHT the current bandwidth can change, add the function ieee80211_vif_change_bandwidth() to take care of this. It returns a failure if the new bandwidth isn't compatible with the existing channel context, the caller has to handle that. When it happens, also inform the driver that the bandwidth changed for this virtual interface (no drivers would actually care today though.) Changing to/from HT/VHT isn't allowed though. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: handle VHT operating mode notificationJohannes Berg
Handle the operating mode notification action frame. When the supported streams or the bandwidth change let the driver and rate control algorithm know. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: track number of spatial streamsJohannes Berg
With VHT, a station can change the number of spatial streams it can receive on the fly, not unlike spatial multiplexing in HT. Prepare for that by tracking the maximum number of spatial streams it can receive when the connection is established. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: stop toggling IEEE80211_HT_CAP_SUP_WIDTH_20_40Johannes Berg
For VHT, many more bandwidth changes are possible. As a first step, stop toggling the IEEE80211_HT_CAP_SUP_WIDTH_20_40 flag in the HT capabilities and instead introduce a bandwidth field indicating the currently usable bandwidth to transmit to the station. Of course, make all drivers use it. To achieve this, make ieee80211_ht_cap_ie_to_sta_ht_cap() get the station as an argument, rather than the new capabilities, so it can set up the new bandwidth field. If the station is a VHT station and VHT bandwidth is in use, also set the bandwidth accordingly. Doing this allows us to get rid of the supports_40mhz flag as the HT capabilities now reflect the true capability instead of the current setting. While at it, also fix ieee80211_ht_cap_ie_to_sta_ht_cap() to not ignore HT cap overrides when MCS TX isn't supported (not that it really happens...) Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15mac80211: add radar detection command/eventSimon Wunderlich
Add command to trigger radar detection in the driver/FW. Once radar detection is started it should continuously monitor for radars as long as the channel active. If radar is detected usermode notified with 'radar detected' event. Scanning and remain on channel functionality must be disabled while doing radar detection/scanning, and vice versa. Based on original patch by Victor Goldenshtein <victorg@ti.com> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15nl80211/cfg80211: add radar detection command/eventSimon Wunderlich
Add new NL80211_CMD_RADAR_DETECT, which starts the Channel Availability Check (CAC). This command will also notify the usermode about events (CAC finished, CAC aborted, radar detected, NOP finished). Once radar detection has started it should continuously monitor for radars as long as the channel is active. This patch enables DFS for AP mode in nl80211/cfg80211. Based on original patch by Victor Goldenshtein <victorg@ti.com> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> [remove WIPHY_FLAG_HAS_RADAR_DETECT again -- my mistake] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-14Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang. 2) Prepare xfrm and pf_key for algorithms without pf_key support, from Jussi Kivilinna. 3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing. 4) Add an IPsec state resolution packet queue to handle packets that are send before the states are resolved. 5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it. From Michal Kubecek. 6) The xfrm gc threshold was configurable just in the initial namespace, make it configurable in all namespaces. From Michal Kubecek. 7) We currently can not insert policies with mark and mask such that some flows would be matched from both policies. Allow this if the priorities of these policies are different, the one with the higher priority is used in this case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13cfg80211: configuration for WoWLAN over TCPJohannes Berg
Intel Wireless devices are able to make a TCP connection after suspending, sending some data and waking up when the connection receives wakeup data (or breaks). Add the WoWLAN configuration and feature advertising API for it. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-13nl80211: add packet offset information for wowlan patternAmitkumar Karwar
If user knows the location of a wowlan pattern to be matched in Rx packet, he can provide an offset with the pattern. This will help drivers to ignore initial bytes and match the pattern efficiently. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> [refactor pattern sending] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-12act_police: move struct tcf_police to act_police.cJiri Pirko
It's not used anywhere else, so move it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12sch_api: introduce qdisc_watchdog_schedule_ns()Jiri Pirko
tbf will need to schedule watchdog in ns. No need to convert it twice. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12sch: make htb_rate_cfg and functions around that genericJiri Pirko
As it is going to be used in tbf as well, push these to generic code. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12net: sctp: remove unused multiple cookie keysDaniel Borkmann
Vlad says: The whole multiple cookie keys code is completely unused and has been all this time. Noone uses anything other then the secret_key[0] since there is no changeover support anywhere. Thus, for now clean up its left-over fragments. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-129p: Modify struct 9p_fid to use a kuid_t not a uid_tEric W. Biederman
Change struct 9p_fid and it's associated functions to use kuid_t's instead of uid_t. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-129p: Modify the stat structures to use kuid_t and kgid_tEric W. Biederman
9p has thre strucrtures that can encode inode stat information. Modify all of those structures to contain kuid_t and kgid_t values. Modify he wire encoders and decoders of those structures to use 'u' and 'g' instead of 'd' in the format string where uids and gids are present. This results in all kuid and kgid conversion to and from on the wire values being performed by the same code in protocol.c where the client is known at the time of the conversion. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-129p: Transmit kuid and kgid valuesEric W. Biederman
Modify the p9_client_rpc format specifiers of every function that directly transmits a uid or a gid from 'd' to 'u' or 'g' as appropriate. Modify those same functions to take kuid_t and kgid_t parameters instead of uid_t and gid_t parameters. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-11mac80211: Fix tx queue handling during scansSeth Forshee
Scans currently work by stopping the netdev tx queues but leaving the mac80211 queues active. This stops the flow of incoming packets while still allowing mac80211 to transmit nullfunc and probe request frames to facilitate scanning. However, the driver may try to wake the mac80211 queues while in this state, which will also wake the netdev queues. To prevent this, add a new queue stop reason, IEEE80211_QUEUE_STOP_REASON_OFFCHANNEL, to be used when stopping the tx queues for off-channel operation. This prevents the netdev queues from waking when a driver wakes the mac80211 queues. This also stops all frames from being transmitted, even those meant to be sent off-channel. Add a new tx control flag, IEEE80211_TX_CTL_OFFCHAN_TX_OK, which allows frames to be transmitted when the queues are stopped only for the off-channel stop reason. Update all locations transmitting off-channel frames to use this flag. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11mac80211: remove IEEE80211_HW_SCAN_WHILE_IDLEJohannes Berg
There are only a few drivers that use HW scan, and all of those don't need a non-idle transition before starting the scan -- some don't even care about idle at all. Remove the flag and code associated with it. The only driver that really actually needed this is wl1251 and it can just do it itself in the hw_scan callback -- implement that. Acked-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11mac80211: remove dynamic PS driver interfaceJohannes Berg
The functions were added for some sort of Bluetooth coexistence, but aren't used, so remove them again. Reviewed-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11mac80211: introduce beacon-only timing dataJohannes Berg
In order to be able to predict the next DTIM TBTT in the driver, add the ability to use timing data from beacons only with the new hardware flag IEEE80211_HW_TIMING_BEACON_ONLY and the BSS info value sync_dtim_count which is only valid if the timing data came from a beacon. The data can only come from a beacon, and if no beacon was received before association it is updated later together with the DTIM count notification. Signed-off-by: Johannes Berg <johannes.berg@intel.com>