summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2014-06-12tcp: fixing TLP's FIN recoveryPer Hurtig
Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from *any* tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. Signed-off-by: Per Hurtig <per.hurtig@kau.se> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12Merge branch 'fec'David S. Miller
Fugang Duan says: ==================== net: fec: Enable Software TSO to improve the tx performance Add SG and software TSO support for FEC. This feature allows to improve outbound throughput performance. Tested on imx6dl sabresd board, running iperf tcp tests shows: * 82% improvement comparing with NO SG & TSO patch $ ethtool -K eth0 sg on $ ethtool -K eth0 tso on [ 3] local 10.192.242.108 port 35388 connected with 10.192.242.167 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 181 MBytes 506 Mbits/sec * cpu loading is 30% $ ethtool -K eth0 sg off $ ethtool -K eth0 tso off [ 3] local 10.192.242.108 port 52618 connected with 10.192.242.167 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 99.5 MBytes 278 Mbits/sec FEC HW support IP header and TCP/UDP hw checksum, support multi buffer descriptor transfer one frame, but don't support HW TSO. And imx6q/dl SOC FEC Gbps speed has HW bus Bandwidth limitation (400Mbps ~ 700Mbps), imx6sx SOC FEC Gbps speed has no HW bandwidth limitation. The patch set just enable TSO feature, which is done following the mv643xx_eth driver. Test result analyze: imx6dl sabresd board: there have 82% improvement, since imx6dl FEC HW has bandwidth limitation, the performance with SW TSO is a milestone. Addition test: imx6sx sdb board: upstream still don't support imx6sx due to some patches being upstream... they use same FEC IP. Use the SW TSO patches test imx6sx sdb board in internal kernel tree: No SW TSO patch: tx bandwidth 840Mbps, cpu loading is 100%. SW TSO patch: tx bandwidth 942Mbps, cpu loading is 65%. It means the patch set have great improvement for imx6sx FEC performance. V2: * From Frank Li's suggestion: Change the API "fec_enet_txdesc_entry_free" name to "fec_enet_get_free_txdesc_num". * Summary David Laight and Eric Dumazet's thoughts: RX BD entry number change to 256. * From ezequiel's suggestion: Follow the latest TSO fixes from his solution to rework the queue stop/wake-up. Avoid unmapping the TSO header buffers. * From Eric Dumazet's suggestion: Avoid more bytes copy, just copying the unaligned part of the payload into first descriptor. The suggestion will bring more complex for the driver, and imx6dl FEC DMA need 16 bytes alignment, but cpu loading is not problem that cpu loading is 30%, the current performance is so better. Later chip like imx6sx Gigbit FEC DMA support byte alignment, so there don't exist memory copy. So, the V2 version drop the suggestion. Anyway, thanks for Eric's response and suggestion. V3: * From David Laight's feedback: Decide to drop RX BD entry number change for the SW TSO patch set. I will generate one separate patch to increase RX BDs entry for interrupt coalescing feature which will be supported in my later patch set. V4: * From David Laight's feedback: Remove the conditional in .fec_enet_get_bd_index(). V5: * Patch #4 update: From David Laight's feedback: "expect fec_enet_get_free_txdesc_num() to return one less than it does currently." Change the function: Return space available, 0..size-1. it always leave one free entry. Which is same as linux circ_buf. Thanks for Eric and ezequiel's help and idea. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Add software TSO supportNimrod Andy
Add software TSO support for FEC. This feature allows to improve outbound throughput performance. Tested on imx6dl sabresd board, running iperf tcp tests shows: - 16.2% improvement comparing with FEC SG patch - 82% improvement comparing with NO SG & TSO patch $ ethtool -K eth0 tso on $ iperf -c 10.192.242.167 -t 3 & [ 3] local 10.192.242.108 port 35388 connected with 10.192.242.167 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 181 MBytes 506 Mbits/sec During the testing, CPU loading is 30%. Since imx6dl FEC Bandwidth is limited to SOC system bus bandwidth, the performance with SW TSO is a milestone. CC: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Laight <David.Laight@ACULAB.COM> CC: Li Frank <B20596@freescale.com> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Add Scatter/gather supportNimrod Andy
Add Scatter/gather support for FEC. This feature allows to improve outbound throughput performance. Tested on imx6dl sabresd board: Running iperf tests shows a 55.4% improvement. $ ethtool -K eth0 sg off $ iperf -c 10.192.242.167 -t 3 & [ 3] local 10.192.242.108 port 52618 connected with 10.192.242.167 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 99.5 MBytes 278 Mbits/sec $ ethtool -K eth0 sg on $ iperf -c 10.192.242.167 -t 3 & [ 3] local 10.192.242.108 port 52617 connected with 10.192.242.167 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 154 MBytes 432 Mbits/sec CC: Li Frank <B20596@freescale.com> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Increase buffer descriptor entry numberNimrod Andy
In order to support SG, software TSO, let's increase BD entry number. CC: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Factorize feature settingNimrod Andy
In order to enhance the code readable, let's factorize the feature list. Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Enable IP header hardware checksumNimrod Andy
IP header checksum is calcalated by network layer in default. To support software TSO, it is better to use HW calculate the IP header checksum. FEC hw checksum feature request the checksum field in frame is zero, otherwise the calculative CRC is not correct. For segmentated TCP packet, HW calculate the IP header checksum again, it doesn't bring any impact. For SW TSO, HW calculated checksum bring better performance. Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12net: fec: Factorize the .xmit transmit functionNimrod Andy
Make the code more readable and easy to support other features like SG, TSO, moving the common transmit function to one api. And the patch also factorize the getting BD index to it own function. CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bridge: fix compile error when compiling without IPv6 supportLinus Lüssing
Some fields in "struct net_bridge" aren't available when compiling the kernel without IPv6 support. Therefore adding a check/macro to skip the complaining code sections in that case. Introduced by 2cd4143192e8c60f66cb32c3a30c76d0470a372d ("bridge: memorize and export selected IGMP/MLD querier port") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bridge: fix smatch warning / potential null pointer dereferenceLinus Lüssing
"New smatch warnings: net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error: we previously assumed 'group' could be null (see line 1349)" In the rare (sort of broken) case of a query having a Maximum Response Delay of zero, we could create a potential null pointer dereference. Fixing this by skipping the multicast specific MLD Query parsing again if no multicast group address is available. Introduced by dc4eb53a996a78bfb8ea07b47423ff5a3aadc362 ("bridge: adhere to querier election mechanism specified by RFCs") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12via-rhine: fix full-duplex with autoneg disableFrançois Cachereul
With some specific configuration (VT6105M on Soekris 5510 and depending on the device at the other end), fragmented packets were not transmitted when forcing 100 full-duplex with autoneg disable. This fix now write full-duplex chips register when forcing full or half-duplex not only when autoneg is enable. Signed-off-by: François Cachereul <f.cachereul@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12Merge branch 'bnx2x'David S. Miller
Yuval Mintz says: ==================== bnx2x: Bug fixes patch series This patch series contains various bug fixes - 2 link related fixes, one sriov-related issue and an additional fix for a theoretical bug on new boards. Please consider applying these patches to `net'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bnx2x: Enlarge the dorq threshold for VFsAriel Elior
A malicious VF might try to starve the other VFs & PF by creating contineous doorbell floods. In order to negate this, HW has a threshold of doorbells per client, which will stop the client doorbells from arriving if crossed. The threshold currently configured for VFs is too low - under extreme traffic scenarios, it's possible for a VF to reach the threshold and thus for its fastpath to stop working. Signed-off-by: Ariel Elior <ariel.elior@qlogic.com> Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bnx2x: Check for UNDI in uncommon branchYuval Mintz
If L2FW utilized by the UNDI driver has the same version number as that of the regular FW, a driver loading after UNDI and receiving an uncommon answer from management will mistakenly assume the loaded FW matches its own requirement and try to exist the flow via FLR. Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com> Signed-off-by: Ariel Elior <ariel.elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bnx2x: Fix 1G-baseT linkYaniv Rosner
Set the phy access mode even in case of link-flap avoidance. Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com> Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com> Signed-off-by: Ariel Elior <ariel.elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12bnx2x: Fix link for KR with swapped polarity laneYaniv Rosner
This avoids clearing the RX polarity setting in KR mode when polarity lane is swapped, as otherwise this will result in failed link. Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com> Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com> Signed-off-by: Ariel Elior <ariel.elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12sctp: Fix sk_ack_backlog wrap-around problemXufeng Zhang
Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/core/rtnetlink.c net/core/skbuff.c Both conflicts were very simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net/core: Add VF link state control policyDoug Ledford
Commit 1d8faf48c7 (net/core: Add VF link state control) added VF link state control to the netlink VF nested structure, but failed to add a proper entry for the new structure into the VF policy table. Add the missing entry so the table and the actual data copied into the netlink nested struct are in sync. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net/fsl: xgmac_mdio is dependent on OF_MDIOAndy Fleming
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net/fsl: Make xgmac_mdio read error message usefulShruti Kanetkar
Print the device address, the register number and the PHY ID for which the MDIO read operation failed Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net_sched: drr: warn when qdisc is not work conservingFlorian Westphal
The DRR scheduler requires that items on the active list are work conserving, i.e. do not hold on to skbs for throttling purposes, etc. Attaching e.g. tbf renders DRR useless because all other classes on the active list are delayed as well. So, warn users that this configuration won't work as expected; we already do this in couple of other qdiscs, see e.g. commit b00355db3f88d96810a60011a30cfb2c3469409d ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler') The 'const' change is needed to avoid compiler warning ("discards 'const' qualifier from pointer target type"). tested with: drr_hier() { parent=$1 classes=$2 for i in $(seq 1 $classes); do classid=$parent$(printf %x $i) tc class add dev eth0 parent $parent classid $classid drr tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit done } tc qdisc add dev eth0 root handle 1: drr drr_hier 1: 32 tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11Merge branch 'inet_csums'David S. Miller
Tom Herbert says: ==================== net: Checksum offload changes - Part IV I am working on overhauling RX checksum offload. Goals of this effort are: - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY - Preserve CHECKSUM_COMPLETE through encapsulation layers - Don't do skb_checksum more than once per packet - Unify GRO and non-GRO csum verification as much as possible - Unify the checksum functions (checksum_init) - Simply code What is in this fourth patch set: - Preserve CHECKSUM_COMPLETE instead of changing it to CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple csums in a packet. - When SW needs to compute the packet checksum, save it as CHECKSUM_COMPLETE. Also mark that checksum was compute by SW. - Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with CHECKSUM_COMPLETE. v2: Removed patch setting skb_encapsulation when validating checksum in tcp_gro_receive Please review carefully and test if possible, mucking with basic checksum functions is always a little precarious :-) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: Add skb_gro_postpull_rcsum to udp and vxlanTom Herbert
Need to gro_postpull_rcsum for GRO to work with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: Save software checksum completeTom Herbert
In skb_checksum complete, if we need to compute the checksum for the packet (via skb_checksum) save the result as CHECKSUM_COMPLETE. Subsequent checksum verification can use this. Also, added csum_complete_sw flag to distinguish between software and hardware generated checksum complete, we should always be able to trust the software computation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: Preserve CHECKSUM_COMPLETE at validationTom Herbert
Currently when the first checksum in a packet is validated using CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY so that any subsequent checksums in the packet are not correctly validated. This patch adds csum_valid flag in sk_buff and uses that to indicate validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit is set accordingly in the skb_checksum_validate_* functions. The flag is checked in skb_checksum_complete, so that validation is communicated between checksum_init and checksum_complete sequence in TCP and UDP. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11Merge branch 'qlcnic-next'David S. Miller
Shahed Shaikh says: ==================== This series contains an enhancement in the area of firmware minidump collection and optimization of ring count validation function. Please apply this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11qlcnic: Update version to 5.3.60Shahed Shaikh
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11qlcnic: Optimize ring count validationsShahed Shaikh
- Check interrupt mode at the start of qlcnic_set_channels(). - Do not validate ring count if they are not going to change. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11qlcnic: Pre-allocate DMA buffer used for minidump collectionShahed Shaikh
Pre-allocate the physically contiguous DMA buffer used for minidump collection at driver load time, rather than at run time, to minimize allocation failures. Driver will allocate the buffer at load time if PEX DMA support capability is indicated by the adapter. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11ip_vti: fix sparse warnings for VTI_ISVTIDmitry Popov
This patch fixes the following sparse warnings: net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:321:19: expected restricted __be16 [addressable] [assigned] [usertype] i_flags net/ipv4/ip_vti.c:321:19: got int net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:447:24: expected restricted __be16 [usertype] i_flags net/ipv4/ip_vti.c:447:24: got int Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16), we can __force cast VTI_ISVTI to __be16 in header file. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11drivers: net: davinci_cpdma: double free on errorDan Carpenter
We recently change the kzalloc() to devm_kzalloc() so freeing "ctlr" here could lead to a double free. Fixes: e194312854ed ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11amd-xgbe: unwind on error in xgbe_mdio_register()Dan Carpenter
There is a typo here so we return directly instead of unwinding. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11mrf24j40: add device managed APIsVarka Bhadram
adds the device managed APIs so that no need worry about freeing the resources. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11ceph: remove bogus externstephen hemminger
Sparse complained about this bogus extern on definition of a function. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: document internal instruction encodingAlexei Starovoitov
This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: mention eBPF terminology as wellAlexei Starovoitov
Since the term eBPF is used anyway on mailing list discussions, lets also document that in the main BPF documentation file and replace a couple of occurrences with eBPF terminology to be more clear. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11ipv4: fix a race in ip4_datagram_release_cb()Eric Dumazet
Alexey gave a AddressSanitizer[1] report that finally gave a good hint at where was the origin of various problems already reported by Dormando in the past [2] Problem comes from the fact that UDP can have a lockless TX path, and concurrent threads can manipulate sk_dst_cache, while another thread, is holding socket lock and calls __sk_dst_set() in ip4_datagram_release_cb() (this was added in linux-3.8) It seems that all we need to do is to use sk_dst_check() and sk_dst_set() so that all the writers hold same spinlock (sk->sk_dst_lock) to prevent corruptions. TCP stack do not need this protection, as all sk_dst_cache writers hold the socket lock. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel AddressSanitizer: heap-use-after-free in ipv4_dst_check Read of size 2 by thread T15453: [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Freed by thread T15455: [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Allocated by thread T15453: [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406 [< inlined >] __ip_route_output_key+0x3e8/0xf70 __mkroute_output ./net/ipv4/route.c:1939 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 [2] <4>[196727.311203] general protection fault: 0000 [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000 <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040 <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200 <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800 <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000 <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0 <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[196727.311713] Stack: <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42 <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0 <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <IRQ> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0 <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80 <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340 <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210 <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270 <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30 <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90 <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60 <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0 <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a <4>[196727.312722] <EOI> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <ffff885effd23a70> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt Reported-by: Alexey Preobrazhensky <preobr@google.com> Reported-by: dormando <dormando@rydia.ne> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: add test_bpf module under MAINTAINERS' networking sectionDaniel Borkmann
Add lib/test_bpf.c entry to maintainers file under networking. All changes were posted via netdev for review, so make sure other people Cc it as well when they call get_maintainer.pl. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: add __pskb_copy_fclone and pskb_copy_for_cloneOctavian Purdila
There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11sfc: PIO:Restrict to 64bit arch and use 64-bit writes.Jon Cooper
Fixes:ee45fd92c739 ("sfc: Use TX PIO for sufficiently small packets") The linux net driver uses memcpy_toio() in order to copy into the PIO buffers. Even on a 64bit machine this causes 32bit accesses to a write- combined memory region. There are hardware limitations that mean that only 64bit naturally aligned accesses are safe in all cases. Due to being write-combined memory region two 32bit accesses may be coalesced to form a 64bit non 64bit aligned access. Solution was to open-code the memory copy routines using pointers and to only enable PIO for x86_64 machines. Not tested on platforms other than x86_64 because this patch disables the PIO feature on other platforms. Compile-tested on x86 to ensure that works. The WARN_ON_ONCE() code in the previous version of this patch has been moved into the internal sfc debug driver as the assertion was unnecessary in the upstream kernel code. This bug fix applies to v3.13 and v3.14 stable branches. Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11Merge branch 'bridge-next'David S. Miller
Toshiaki Makita says: ==================== bridge: 802.1ad vlan protocol support Currently bridge vlan filtering doesn't work fine with 802.1ad protocol. Only if a bridge is configured without pvid, the bridge receives only 802.1ad tagged frames and no STP is used, it will work. Otherwise: - If pvid is configured, it can put only 802.1Q tags but cannot put 802.1ad tags. - If 802.1Q and 802.1ad tagged frames arrive in mixture, it applies filtering regardless of their protocols. - While an 802.1ad bridge should use another mac address for STP BPDU and should forward customer's BPDU frames, it can't. Thus, we can't properly handle frames once 802.1ad is used. Handling 802.1ad is useful if we want to allow stacked vlans to be used, e.g., guest VMs wants to use vlan tags and the host also wants to segregate guest's traffic from other guests' by vlan tags. Here is the image describing how to configure a bridge to filter VMs traffic. +-------+p/u +-----+ +---------+ +----+ | |------|vnet0|--|User A VM| |eth0|--|802.1ad| +-----+ +---------+ +----+ |bridge |p/u +-----+ +---------+ | |------|vnet1|--|User B VM| +-------+ +-----+ +---------+ p/u: pvid/untagged This patch set enables us to set vlan protocols per bridge. This tries to implement a bridge like S-VLAN component in IEEE 802.1Q-2011 spec. Note that there is another possible implementation that sets vlan protocols per port. Some HW switches seem to take that approach. However, I think per-bridge approach is better, because; - I think the typical usage of an 802.1ad bridge is segregating 802.1Q tagged traffic (like what is described above), and this doesn't need the ability to be set protocols per port. Also, If a bridge has many ports and it supports per-port setting, we might have to make much more extra configurations to change protocols of all ports. - I assume that the main perpose to set protocol per port is to assign S-VID according to C-VID, or to realize two logical bridges (one is an 802.1Q filtering bridge and the other is an 802.1ad filtering bridge) in one bridge. The former usually needs additional features such as vlan id mapping, and is likely to make bridge's code complicated. If a user wants, such enhanced features can be accomplished by a combination of multiple bridges, so it is not absolutely necessary to implement these features in a bridge itself. The latter is simply unnecessary because we can easily make two bridges of which one is an 802.1Q bridge and the other is an 802.1ad bridge. Here is an example of the enhanced feature that we can realize by using multiple bridges and veth interfaces. This way is documented in IEEE 802.1Q-2011 clause 15.4 (C-tagged service interface). +----+ +-------+p/u +------+ +----+ +--+ |eth0|--|802.1ad|----veth----|802.1Q|--|vnet|--|VM| +----+ |bridge |----veth----|bridge| +----+ +--+ +-------+p/u +------+ p/u: pvid/untagged In this configuration, we can map C-VIDs to any S-VID. For example; C-VID 10 and 20 to S-VID 100 C-VID 30 to S-VID 110 This is achieved through the 802.1Q bridge that forwards C-tagged frames to proper ports of the 802.1ad bridge. Changes: v1 -> v2: - Make the way to forward bridge group addresses more generic by introducing new mask, group_fwd_mask_required. RFC -> v1: - Add S-TAG tx offload. - Remove a fix around stacked vlan which has already been fixed. - Take into account Bridge Group Addresses. - Separate handling of protocol-mismatch from br_vlan_get_tag(). - Change the way to set vlan_proto from netlink to sysfs because no other existing configuration per bridge can be set by netlink. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Support 802.1ad vlan filteringToshiaki Makita
This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Prepare for forwarding another bridge group addressesToshiaki Makita
If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Prepare for 802.1ad vlan filtering supportToshiaki Makita
This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Add 802.1ad tx vlan accelerationToshiaki Makita
Bridge device doesn't need to embed S-tag into skb->data. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: xen-netback: include linux/vmalloc.h againArnd Bergmann
commit e9ce7cb6b107 ("xen-netback: Factor queue-specific data into queue struct") added a use of vzalloc/vfree to interface.c, but removed the #include <linux/vmalloc.h> statement at the same time, which causes this build error: drivers/net/xen-netback/interface.c: In function 'xenvif_free': drivers/net/xen-netback/interface.c:754:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(vif->queues); ^ cc1: some warnings being treated as errors Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: phy: realtek: register/unregister multiple drivers properlyJongsung Kim
Using phy_drivers_register/_unregister functions is proper way to handle multiple PHY drivers registration. For Realtek PHY drivers module, it fixes incomplete current error-handlings up and adds missed unregistration for the RTL8201CP driver. Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: sh_eth: Fix timing of RACT setting in sh_eth_rx()Yoshihiro Shimoda
This patch fixes an issue that we cannot use nfs rootfs correctly on r8a7790 when the command below runs on a host PC. $ sudo ping -f -l 8 $BOARD_IP_ADDR Since the driver sets the RACT to 1 in the first while loop of sh_eth_rx(), the controller accepts a next frame into the next RX descriptor during the while loop. But, in the first while loop doesn't allocate a next skb. So, this patch removes the RACT setting in the first while loop of sh_eth_rx(). Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: sh_eth: Fix receive packet "exceeded" condition in sh_eth_rx()Yoshihiro Shimoda
This patch fixes the packet "exceeded" condition in sh_eth_rx() when RACT in an RX descriptor is not set and the "quota" is 0. Otherwise, kernel panic happens because the "&n->poll_list" is deleted twice in sh_eth_poll() which calls napi_complete() and net_rx_action(). Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>