summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2014-01-21rxrpc: out of bound read in debug codeDan Carpenter
Smatch complains because we are using an untrusted index into the rxrpc_acks[] array. It's just a read and it's only in the debug code, but it's simple enough to add a check and fix it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-218021q: update descriptionYegor Yefremov
Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add some beautifications like replacing 'ethernet' with 'Ethernet' and removing unneeded spaces. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21ipv6: protect protocols not handling ipv4 from v4 connection/bind attemptsHannes Frederic Sowa
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow connecting and binding to them. sendmsg logic does already check msg->name for this but must trust already connected sockets which could be set up for connection to ipv4 address family. Per-socket flag ipv6only is of no use here, as it is under users control by setsockopt. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21ipv6: enable anycast addresses as source addresses in ICMPv6 error messagesFX Le Bail
- Uses ipv6_anycast_destination() in icmp6_send(). Suggested-by: Bill Fink <billfink@mindspring.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21tcp: delete redundant calls of tcp_mtup_init()Peter Pan(潘卫平)
As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen sock, we can delete the redundant calls of tcp_mtup_init() in tcp_{v4,v6}_syn_recv_sock(). Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21packet: fix a couple of cppcheck warningsDaniel Borkmann
Doesn't bring much, but also doesn't hurt us to fix 'em: 1) In tpacket_rcv() flush dcache page we can restirct the scope for start and end and remove one layer of indent. 2) In tpacket_destruct_skb() we can restirct the scope for ph. 3) In alloc_one_pg_vec_page() we can remove the NULL assignment and change spacing a bit. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21ipv4: remove the useless argument from ip_tunnel_hash()Duan Jiong
Since commit c544193214("GRE: Refactor GRE tunneling code") introduced function ip_tunnel_hash(), the argument itn is no longer in use, so remove it. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21net: remove unnecessary initializations in net_dev_initSabrina Dubroca
softnet_data is already set to 0, no need to use memset or initialize specific fields to 0 or NULL afterwards. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()WANG Cong
So that we will not expose struct tcf_common to modules. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21net_sched: act: fetch hinfo from a->ops->hinfoWANG Cong
Every action ops has a pointer to hash info, so we don't need to hard-code it in each module. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21Merge tag 'gpio-v3.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO tree bulk changes from Linus Walleij: "A big set this merge window, as we have much going on in this subsystem. The changes to other subsystems (notably a slew of ARM machines as I am doing away with their custom APIs) have all been ACKed to the extent possible. Major changes this time: - Some core improvements and cleanups to the new GPIO descriptor API. This seems to be working now so we can start the exodus to this API, moving gradually away from the global GPIO numberspace. - Incremental improvements to the ACPI GPIO core, and move the few GPIO ACPI clients we have to the GPIO descriptor API right *now* before we go any further. We actually managed to contain this *before* we started to litter the kernel with yet another hackish global numberspace for the ACPI GPIOs, which is a big win. - The RFkill GPIO driver and all platforms using it have been migrated to use the GPIO descriptors rather than fixed number assignments. Tegra machine has been migrated as part of this. - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x. Those should be really good examples of how I expect a nice GPIO driver to look these days. - Do away with custom GPIO implementations on a major part of the ARM machines: ks8695, lpc32xx, mv78xx0. Make a first step towards the same in the horribly convoluted Samsung S3C include forest. We expect to continue to clean this up as we move forward. - Flag GPIO lines used for IRQ on adnp, bcm-kona, em, intel-mid and lynxpoint. This makes the GPIOlib core aware that a certain GPIO line is used for IRQs and can then enforce some semantics such as disallowing a GPIO line marked as in use for IRQ to be switched to output mode. - Drop all use of irq_set_chip_and_handler_name(). The name provided in these cases were just unhelpful tags like "mux" or "demux". - Extend the MCP23s08 driver to handle interrupts. - Minor incremental improvements for rcar, lynxpoint, em 74x164 and msm drivers. - Some non-urgent bug fixes here and there, duplicate #includes and that usual kind of cleanups" Fix up broken Kconfig file manually to make this all compile. * tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (71 commits) gpio: mcp23s08: fix casting caused build warning gpio: mcp23s08: depend on OF_GPIO gpio: mcp23s08: Add irq functionality for i2c chips ARM: S5P[v210|c100|64x0]: Fix build error gpio: pxa: clamp gpio get value to [0,1] ARM: s3c24xx: explicit dependency on <plat/gpio-cfg.h> ARM: S3C[24|64]xx: move includes back under <mach/> scope Documentation / ACPI: update to GPIO descriptor API gpio / ACPI: get rid of acpi_gpio.h gpio / ACPI: register to ACPI events automatically mmc: sdhci-acpi: convert to use GPIO descriptor API ARM: s3c24xx: fix build error gpio: f7188x: set can_sleep attribute gpio: samsung: Update documentation gpio: samsung: Remove hardware.h inclusion gpio: xtensa: depend on HAVE_XTENSA_GPIO32 gpio: clps711x: Enable driver compilation with COMPILE_TEST gpio: clps711x: Use of_match_ptr() net: rfkill: gpio: convert to descriptor-based GPIO interface leds: s3c24xx: Fix build failure ...
2014-01-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...
2014-01-19net: fix "queues" uevent between network namespacesWeilong Chen
When I create a new namespace with 'ip netns add net0', or add/remove new links in a namespace with 'ip link add/delete type veth', rx/tx queues events can be got in all namespaces. That is because rx/tx queue ktypes do not have namespace support, and their kobj parents are setted to NULL. This patch is to fix it. Reported-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19net_sched: act: remove capab from struct tc_action_opsWANG Cong
It is not actually implemented. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19net: document accel_priv parameter for __dev_queue_xmit()Jason Wang
To silent "make htmldocs" warning. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19ipv6: optimize link local address searchHannes Frederic Sowa
ipv6_link_dev_addr sorts newly added addresses by scope in ifp->addr_list. Smaller scope addresses are added to the tail of the list. Use this fact to iterate in reverse over addr_list and break out as soon as a higher scoped one showes up, so we can spare some cycles on machines with lot's of addresses. The ordering of the addresses is not relevant and we are more likely to get the eui64 generated address with this change anyway. Suggested-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19ipv6: make IPV6_RECVPKTINFO work for ipv4 datagramsHannes Frederic Sowa
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data for IPv4 datagrams on IPv6 sockets. This patch splits the ip6_datagram_recv_ctl into two functions, one which handles both protocol families, AF_INET and AF_INET6, while the ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data. ip6_datagram_recv_*_ctl never reported back any errors, so we can make them return void. Also provide a helper for protocols which don't offer dual personality to further use ip6_datagram_recv_ctl, which is exported to modules. I needed to shuffle the code for ping around a bit to make it easier to implement dual personality for ping ipv6 sockets in future. Reported-by: Gert Doering <gert@space.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19sch_netem: replace magic numbers with enumerateYang Yingliang
Replace some magic numbers which describe states of 4-state model loss generator with enumerate. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19ipv6: add flowlabel_consistency sysctlFlorent Fourcot
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of flow label unicity. This patch introduces a new sysctl to protect the old behaviour, enable by default. Changelog of V3: * rename ip6_flowlabel_consistency to flowlabel_consistency * use net_info_ratelimited() * checkpatch cleanups Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19ipv6: add a flag to get the flow label used remotlyFlorent Fourcot
This information is already available via IPV6_FLOWINFO of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label information. But it is probably logical and easier for users to add this here, and to control both sent/received flow label values with the IPV6_FLOWLABEL_MGR option. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GETFlorent Fourcot
With this option, the socket will reply with the flow label value read on received packets. The goal is to have a connection with the same flow label in both direction of the communication. Changelog of V4: * Do not erase the flow label on the listening socket. Use pktopts to store the received value Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18ipv4: be friend with drop monitorEric Dumazet
Replace some dev_kfree_skb() with kfree_skb() calls when we drop one skb, this might help bug tracking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18net: add build-time checks for msg->msg_name sizeSteffen Hurrle
This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18net: introduce SO_BPF_EXTENSIONSMichal Sekletar
For user space packet capturing libraries such as libpcap, there's currently only one way to check which BPF extensions are supported by the kernel, that is, commit aa1113d9f85d ("net: filter: return -EINVAL if BPF_S_ANC* operation is not supported"). For querying all extensions at once this might be rather inconvenient. Therefore, this patch introduces a new option which can be used as an argument for getsockopt(), and allows one to obtain information about which BPF extensions are supported by the current kernel. As David Miller suggests, we do not need to define any bits right now and status quo can just return 0 in order to state that this versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later additions to BPF extensions need to add their bits to the bpf_tell_extensions() function, as documented in the comment. Signed-off-by: Michal Sekletar <msekleta@redhat.com> Cc: David Miller <davem@davemloft.net> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c net/ipv4/tcp_metrics.c Overlapping changes between the "don't create two tcp metrics objects with the same key" race fix in net and the addition of the destination address in the lookup key in net-next. Minor overlapping changes in bnx2x driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17Bluetooth: remove direct compilation of 6lowpan_iphc.cStephen Warren
It's now built as a separate utility module, and enabling BT selects that module in Kconfig. This fixes: net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data' net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress' net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here net/ieee802154/built-in.o: In function `lowpan_header_compress': net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress' net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here net/ieee802154/built-in.o: In function `lowpan_process_data': net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data' net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here make[1]: *** [net/built-in.o] Error 1 (this change probably simply wasn't "git add"d to a53d34c3465b) Fixes: a53d34c3465b ("net: move 6lowpan compression code to separate module") Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices") Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17bonding: add netlink attributes to slave link devsfeldma@cumulusnetworks.com
If link is IFF_SLAVE, extend link dev netlink attributes to include slave attributes with new IFLA_SLAVE nest. Add netlink notification (RTM_NEWLINK) when slave status changes from backup to active, or visa-versa. Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE attributes. Currently only used by bonding driver, but could be used by other aggregating devices with slaves. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17ipv4: fix a dst leak in tunnelsEric Dumazet
This patch : 1) Remove a dst leak if DST_NOCACHE was set on dst Fix this by holding a reference only if dst really cached. 2) Remove a lockdep warning in __tunnel_dst_set() This was reported by Cong Wang. 3) Remove usage of a spinlock where xchg() is enough 4) Remove some spurious inline keywords. Let compiler decide for us. Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17ipv6: send Change Status Report after DAD is completedFlavio Leitner
The RFC 3810 defines two type of messages for multicast listeners. The "Current State Report" message, as the name implies, refreshes the *current* state to the querier. Since the querier sends Query messages periodically, there is no need to retransmit the report. On the other hand, any change should be reported immediately using "State Change Report" messages. Since it's an event triggered by a change and that it can be affected by packet loss, the rfc states it should be retransmitted [RobVar] times to make sure routers will receive timely. Currently, we are sending "Current State Reports" after DAD is completed. Before that, we send messages using unspecified address (::) which should be silently discarded by routers. This patch changes to send "State Change Report" messages after DAD is completed fixing the behavior to be RFC compliant and also to pass TAHI IPv6 testsuite. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17ipv6: simplify detection of first operational link-local address on interfaceHannes Frederic Sowa
In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") I build the detection of the first operational link-local address much to complex. Additionally this code now has a race condition. Replace it with a much simpler variant, which just scans the address list when duplicate address detection completes, to check if this is the first valid link local address and send RS and MLD reports then. Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") Reported-by: Jiri Pirko <jiri@resnulli.us> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17tcp: metrics: Avoid duplicate entries with the same destination-IPChristoph Paasch
Because the tcp-metrics is an RCU-list, it may be that two soft-interrupts are inside __tcp_get_metrics() for the same destination-IP at the same time. If this destination-IP is not yet part of the tcp-metrics, both soft-interrupts will end up in tcpm_new and create a new entry for this IP. So, we will have two tcp-metrics with the same destination-IP in the list. This patch checks twice __tcp_get_metrics(). First without holding the lock, then while holding the lock. The second one is there to confirm that the entry has not been added by another soft-irq while waiting for the spin-lock. Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAITFlorent Fourcot
This patch is following the commit b903d324bee262 (ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT). For the same reason than tclass, we have to store the flow label in the inet_timewait_sock to provide consistency of flow label on the last ACK. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17net: rds: fix per-cpu helper usageGerald Schaefer
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu handling for rds. chpfirst is the result of __this_cpu_read(), so it is an absolute pointer and not __percpu. Therefore, __this_cpu_write() should not operate on chpfirst, but rather on cache->percpu->first, just like __this_cpu_read() did before. Cc: <stable@vger.kernel.org> # 3.8+ Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2014-01-16net-sysfs: add support for device-specific rx queue sysfs attributesMichael Dalton
Extend existing support for netdevice receive queue sysfs attributes to permit a device-specific attribute group. Initial use case for this support will be to allow the virtio-net device to export per-receive queue mergeable receive buffer size. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: allow > 0 order atomic page alloc in skb_page_frag_refillMichael Dalton
skb_page_frag_refill currently permits only order-0 page allocs unless GFP_WAIT is used. Change skb_page_frag_refill to attempt higher-order page allocations whether or not GFP_WAIT is used. If memory cannot be allocated, the allocator will fall back to successively smaller page allocs (down to order-0 page allocs). This change brings skb_page_frag_refill in line with the existing page allocation strategy employed by netdev_alloc_frag, which attempts higher-order page allocations whether or not GFP_WAIT is set, falling back to successively lower-order page allocations on failure. Part of migration of virtio-net to per-receive queue page frag allocators. Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net_sched: fix error return code in fw_change_attrs()Wei Yongjun
The error code was not set if change indev fail, so the error condition wasn't reflected in the return value. Fix to return a negative error code from this error handling case instead of 0. Fixes: 2519a602c273 ('net_sched: optimize tcf_match_indev()') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16tipc: standardize recvmsg routineYing Xue
Standardize the behaviour of waiting for events in TIPC recvmsg() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling recvmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16tipc: standardize sendmsg routine of connected socketYing Xue
Standardize the behaviour of waiting for events in TIPC send_packet() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling sendmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16tipc: standardize sendmsg routine of connectionless socketYing Xue
Comparing the behaviour of how to wait for events in TIPC sendmsg() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, sk_sleep() and tport->congested variables associated with socket are exposed without socket lock protection while wait_event_interruptible_timeout() accesses them. So standardizing it with similar implementation in other stacks can help us correct these errors which the process of calling sendmsg() cannot be woken up event if an expected event arrive at socket or improperly woken up although the wake condition doesn't match. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16tipc: standardize accept routineYing Xue
Comparing the behaviour of how to wait for events in TIPC accept() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. As sk_sleep() and sk->sk_receive_queue variables associated with socket are not protected by socket lock, the process of calling accept() may be woken up improperly or sometimes cannot be woken up at all. After standardizing it with inet_csk_wait_for_connect routine, we can get benefits including: avoiding 'thundering herd' phenomenon, adding a timeout mechanism for accept(), coping with a pending signal, and having sk_sleep() and sk->sk_receive_queue being always protected within socket lock scope and so on. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16tipc: standardize connect routineYing Xue
Comparing the behaviour of how to wait for events in TIPC connect() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, as both sock->state and sk_sleep() are directly fed to wait_event_interruptible_timeout() as its arguments, and socket lock has to be released before we call wait_event_interruptible_timeout(), the two variables associated with socket are exposed out of socket lock protection, thereby probably getting stale values so that the process of calling connect() cannot be woken up exactly even if correct event arrives or it is woken up improperly even if the wake condition is not satisfied in practice. Therefore, standardizing its behaviour with sk_stream_wait_connect routine can avoid these risks. Additionally the implementation of connect routine is simplified as a whole, allowing it to return correct values in all different cases. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16sctp: remove the unnecessary assignmentwangweidong
When go the right path, the status is 0, no need to assign it again. So just remove the assignment. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net_sched: act: pick a different type for act_xtWANG Cong
In tcf_register_action() we check either ->type or ->kind to see if there is an existing action registered, but ipt action registers two actions with same type but different kinds. They should have different types too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included change: - properly format already existing kerneldoc Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net_sched: act: use tcf_hash_release() in net/sched/act_police.cWANG Cong
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included change: - properly compute the batman-adv header overhead. Such result is later used to initialize the hard_header_len member of the soft-interface netdev object Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: add NETDEV_PRECHANGEMTU to notify before mtu change happensVeaceslav Falico
Currently, if a device changes its mtu, first the change happens (invloving all the side effects), and after that the NETDEV_CHANGEMTU is sent so that other devices can catch up with the new mtu. However, if they return NOTIFY_BAD, then the change is reverted and error returned. This is a really long and costy operation (sometimes). To fix this, add NETDEV_PRECHANGEMTU notification which is called prior to any change actually happening, and if any callee returns NOTIFY_BAD - the change is aborted. This way we're skipping all the playing with apply/revert the mtu. CC: "David S. Miller" <davem@davemloft.net> CC: Jiri Pirko <jiri@resnulli.us> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: Check skb->rxhash in gro_receiveTom Herbert
When initializing a gro_list for a packet, first check the rxhash of the incoming skb against that of the skb's in the list. This should be a very strong inidicator of whether the flow is going to be matched, and potentially allows a lot of other checks to be short circuited. Use skb_hash_raw so that we don't force the hash to be calculated. Tested by running netperf 200 TCP_STREAMs between two machines with GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction of time in dev_gro_receive. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16packet: use percpu mmap tx frame pending refcountDaniel Borkmann
In PF_PACKET's packet mmap(), we can avoid using one atomic_inc() and one atomic_dec() call in skb destructor and use a percpu reference count instead in order to determine if packets are still pending to be sent out. Micro-benchmark with [1] that has been slightly modified (that is, protcol = 0 in socket(2) and bind(2)), example on a rather crappy testing machine; I expect it to scale and have even better results on bigger machines: ./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs: With patch: 4,022,015 cyc Without patch: 4,812,994 cyc time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable: With patch: real 1m32.241s user 0m0.287s sys 1m29.316s Without patch: real 1m38.386s user 0m0.265s sys 1m35.572s In function tpacket_snd(), it is okay to use packet_read_pending() since in fast-path we short-circuit the condition already with ph != NULL, since we have next frames to process. In case we have MSG_DONTWAIT, we also do not execute this path as need_wait is false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is okay to call a packet_read_pending(), because when we ever reach that path, we're done processing outgoing frames anyway and only look if there are skbs still outstanding to be orphaned. We can stay lockless in this percpu counter since it's acceptable when we reach this path for the sum to be imprecise first, but we'll level out at 0 after all pending frames have reached the skb destructor eventually through tx reclaim. When people pin a tx process to particular CPUs, we expect overflows to happen in the reference counter as on one CPU we expect heavy increase; and distributed through ksoftirqd on all CPUs a decrease, for example. As David Laight points out, since the C language doesn't define the result of signed int overflow (i.e. rather than wrap, it is allowed to saturate as a possible outcome), we have to use unsigned int as reference count. The sum over all CPUs when tx is complete will result in 0 again. The BUG_ON() in tpacket_destruct_skb() we can remove as well. It can _only_ be set from inside tpacket_snd() path and we made sure to increase tx_ring.pending in any case before we called po->xmit(skb). So testing for tx_ring.pending == 0 is not too useful. Instead, it would rather have been useful to test if lower layers didn't orphan the skb so that we're missing ring slots being put back to TP_STATUS_AVAILABLE. But such a bug will be caught in user space already as we end up realizing that we do not have any TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set. Btw, in case of RX_RING path, we do not make use of the pending member, therefore we also don't need to use up any percpu memory here. Also note that __alloc_percpu() already returns a zero-filled percpu area, so initialization is done already. [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>