summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2015-02-09Merge remote-tracking branch 'asoc/topic/dmaengine' into asoc-nextMark Brown
2015-02-09Merge remote-tracking branch 'asoc/topic/dapm' into asoc-nextMark Brown
2015-02-09Merge remote-tracking branch 'asoc/topic/core' into asoc-nextMark Brown
2015-02-09Merge remote-tracking branches 'asoc/fix/ac97', 'asoc/fix/atmel', ↵Mark Brown
'asoc/fix/intel', 'asoc/fix/rt286', 'asoc/fix/rt5640', 'asoc/fix/samsung', 'asoc/fix/sgtl5000', 'asoc/fix/sta32x', 'asoc/fix/tlv320aic3x' and 'asoc/fix/wm8731' into asoc-linus
2015-02-09Merge tag 'asoc-v3.19-rc2' into asoc-linusMark Brown
ASoC: Updates for v3.20 Nothing too exciting here yet, a small optimization for DAPM from Lars-Peter and a few small bits and pieces for drivers but nothing that really stands out. # gpg: Signature made Tue 30 Dec 2014 00:15:48 HKT using RSA key ID 5D5487D0 # gpg: Oops: keyid_from_fingerprint: no pubkey # gpg: key AF88CD16: no public key for trusted key - skipped # gpg: key AF88CD16 marked as ultimately trusted # gpg: key 5621E907: no public key for trusted key - skipped # gpg: key 5621E907 marked as ultimately trusted # gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>" # gpg: aka "Mark Brown <broonie@debian.org>" # gpg: aka "Mark Brown <broonie@kernel.org>" # gpg: aka "Mark Brown <broonie@tardis.ed.ac.uk>" # gpg: aka "Mark Brown <broonie@linaro.org>" # gpg: aka "Mark Brown <Mark.Brown@linaro.org>"
2015-02-09ASoC: rt5670: Set use_single_rw flag for regmapBard Liao
RT5670 doesn't support auto incrementing writes so driver should set the use_single_rw flag for regmap. Signed-off-by: Bard Liao <bardliao@realtek.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2015-02-09ASoC: rt286: Add rt288 codec supportBard Liao
This patch adds support for rt288 codec. Signed-off-by: Bard Liao <bardliao@realtek.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-02-09ASoC: max98357a: Fix build in !CONFIG_OF caseMark Brown
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-02-09ASoC: Intel: fix platform_no_drv_owner.cocci warningskbuild test robot
sound/soc/intel/cht_bsw_rt5645.c:315:3-8: No need to set .owner here. The core will do it. Remove .owner field if calls are used which set it automatically Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-02-09ARM: dts: Switch Odroid X2/U2 to simple-audio-cardSylwester Nawrocki
Now when the CDCLK I2S output clock can be handled through the clock API the Odroid X2/U3 can be switched to the simple-audio-card DT binding. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-02-09ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes updateSylwester Nawrocki
Clock related properties are added to the Exynos4 I2S device nodes so they can be referred to as clock providers. Missing i2s_opclk1 clock is added to the I2S0 node and clock properties are added to the MAX98090 codec node to allow it to control/read frequency of the MCLK clock directly. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-02-09nios2: Remove unused prepare_to_copy()Tobias Klauser
prepare_to_copy() was removed from all architectures supported at that time in commit 55ccf3fe3f9a ("fork: move the real prepare_to_copy() users to arch_dup_task_struct()"). Remove it from nios2 as well. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Ley Foon Tan <lftan@altera.com>
2015-02-08net:rfs: adjust table size checkingEric Dumazet
Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08cxgb4: Fix trace observed while dumping clip_tblHariprasad Shenai
Handle clip_tbl debugfs entry, when clip_tbl isn't allocated. In commit b5a02f503caa0837 ("cxgb4: Update ipv6 address handling api") wrong argument was passed for single_open for clip_tbl debugfs entry, which led to below trace. Fixing it. ====== call Trace: [<ffffffffa073c606>] clip_tbl_open+0x16/0x30 [cxgb4] [<ffffffff8119e2fa>] do_dentry_open+0x21a/0x370 [<ffffffff8119e499>] vfs_open+0x49/0x50 [<ffffffff811b0d0e>] do_last+0x21e/0x800 [<ffffffff811b1382>] path_openat+0x92/0x470 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff811b189a>] do_filp_open+0x4a/0xa0 [<ffffffff811bdc5d>] ? __alloc_fd+0xcd/0x140 [<ffffffff8119fa4a>] do_sys_open+0x11a/0x230 [<ffffffff8101219f>] ? syscall_trace_enter_phase2+0xaf/0x1b0 [<ffffffff8119fb9e>] SyS_open+0x1e/0x20 [<ffffffff815bf6f0>] tracesys_phase2+0xd4/0xd9 Code: 89 e5 66 66 66 66 90 48 8b 47 e0 48 8b 40 30 48 8b 40 58 c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 47 e0 <48> 8b 40 58 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 RIP [<ffffffff8120898d>] PDE_DATA+0xd/0x20 RSP <ffff8800b08c3c48> CR2: 0000000000000058 ===== Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08rhashtable: using ERR_PTR requires linux/err.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08i40evf: stop the watchdog for shutdownMitch Williams
Stop the watchdog during shutdown. Failing to do this causes a log full of admin queue errors and the occasional hang when the system is shut down. Change-ID: Ib2fd11213cca2fa589eb68577e86b1000c23c250 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40evf: ignore bogus messages from FWMitch Williams
Occasionally on shutdown, the FW will hand us a bunch of messages filled with zeros, which can cause us to spin trying to handle them. Just ignore these and get on with shutting down. Change-ID: I347e9648f7153ad5a7b7e0847b87f7aad5f3e0da Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40evf: reset on module unloadMitch Williams
When the module is being unloaded, don't wait for the PF to politely handle all of our admin queue requests, as that might take forever with a lot of VFs enabled. Instead, just stop everything and request a VF reset. When the original shutdown code was written, VF resets were unreliable, so we avoided them. But with production hardware and firmware, and the 1.x PF driver, this is no longer the case. This fixes a potential multi-minute delay on driver unload, VF disable, or system shutdown. Change-ID: Ib43d6d860ef6b9b8f26e8dce0615a0302608c7d9 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40e: add locking around VF resetMitch Williams
During VF deallocation, we need to lock out the VF reset code. However, we cannot depend on simply masking the interrupt, as this does not lock out the service task, which can still call the reset routine. Instead, leave the interrupt enabled, but add locking around the VF disable and reset routines. For the disable code, we wait to get the lock, as the reset code will take a finite amount of time to run. For the reset code, we just return if we fail to get the lock. Since we know that the VFs are being disabled, we don't need to handle the reset. This fixes a panic when disabling SR-IOV. Change-ID: Iea0a6cdef35c331f48c6d5b2f8e6f0e86322e7d8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40e: Use even more ARQ descriptorsMitch Williams
When enabling 64 VFs and loading the VF driver in the host kernel, we can easily overrun the PF's admin receive queue. Double the size of this queue, and increase the work limit to allow the PF to handle more requests in a single pass through the service task. Change-ID: I0efbbdc61954bffad422a2f33c4b948a59370bf5 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40e: delay after VF resetMitch Williams
Delay a minimum of 10ms after VF reset, to allow the hardware's internal FIFOs to flush. Change-ID: I8a02ddb28c9f0d7303a1eb21d0b2443e5b4c1cda Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08i40e: avoid use of uninitialized v_budget in i40e_init_msixJohn W Linville
This I40E_FCOE block increments v_budget before it has been initialized, then v_budget gets overwritten a few lines later. This patch just reorders the code hunks in what I believe was the intended sequence. Coverity: CID 1260099 Signed-off-by: John W Linville <linville@tuxdriver.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08Linux 3.19v3.19Linus Torvalds
2015-02-08SUNRPC: Remove TCP client connection reset hackTrond Myklebust
Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: TCP/UDP always close the old socket before reconnectingTrond Myklebust
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Add helpers to prevent socket create from racingTrond Myklebust
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Ensure xs_reset_transport() resets the close connection flagsTrond Myklebust
Otherwise, we may end up looping. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Do not clear the source port in xs_reset_transportTrond Myklebust
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Handle EADDRINUSE on connectTrond Myklebust
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08i40e: i40e_fcoe.c: Remove unused functionRickard Strandqvist
Remove the function i40e_rx_is_fip() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08Merge tag 'nios2-fixes-v3.19-final' of ↵Linus Torvalds
git://git.rocketboards.org/linux-socfpga-next Pull nios2 fix from Ley Foon Tan: "This fixes incorrect behavior of some user programs" * tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next: nios2: fix unhandled signals
2015-02-08Merge git://git.kvack.org/~bcrl/aio-fixesLinus Torvalds
Pull aio nested sleep annotation from Ben LaHaise, * git://git.kvack.org/~bcrl/aio-fixes: aio: annotate aio_read_event_ring for sleep patterns
2015-02-08Merge tag 'trace-fixes-v3.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed out a real bug. During suspend and resume the tlb_flush tracepoint is called when the CPU is going offline. As the CPU has been noted as offline, RCU is ignoring that CPU, which means that it can not use RCU protected locks. When tracepoints are activated, they require RCU locking, and if RCU is ignoring a CPU that runs a tracepoint, there is a chance that the tracepoint could cause corruption. The solution was to change the tracepoint into a TRACE_EVENT_CONDITION() which allows us to check a condition to determine if the tracepoint should be called or not. If the condition is not met, the rcu protected code will not be executed. By adding the condition "cpu_online(smp_processor_id())", this will prevent the RCU protected code from being executed if the CPU is marked offline. After adding this, another bug was discovered. As RCU checks rcu callers, if a rcu call is not done, there is no check (obviously). We found that tracepoints could be added in RCU ignored locations and not have lockdep complain until the tracepoint is activated. This missed places where tracepoints were added in places they should not have been. To fix this, code was added in 3.18 that if lockdep is enabled, any tracepoint will still call the rcu checks even if the tracepoint is not enabled. The bug here, is that the check does not take the CONDITION into account. As the condition may prevent tracepoints from being activated in RCU ignored areas (as the one patch does), we get false positives when we enable lockdep and hit a tracepoint that the condition prevents it from being called in a RCU ignored location. The fix for this is to add the CONDITION to the rcu checks, even if the tracepoint is not enabled" * tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/tlb/trace: Do not trace on CPU that is offline tracing: Add condition check to RCU lockdep checks
2015-02-09nios2: fix unhandled signalsChung-Ling Tang
Follow other architectures for user fault handling. Signed-off-by: Chung-Ling Tang <cltang@codesourcery.com> Acked-by: Ley Foon Tan <lftan@altera.com>
2015-02-08vxlan: Wrong type passed to %pISRasmus Villemoes
src_ip is a pointer to a union vxlan_addr, one member of which is a struct sockaddr. Passing a pointer to src_ip is wrong; one should pass the value of src_ip itself. Since %pIS formally expects something of type struct sockaddr*, let's pass a pointer to the appropriate union member, though this of course doesn't change the generated code. Fixes: e4c7ed415387 ("vxlan: add ipv6 support") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08Driver: Vmxnet3: Change the hex constant to its decimal equivalentShrikrishna Khare
The hex constant chosen for VMXNET3_REV1_MAGIC is offensive, replace it with its decimal equivalent. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Reviewed-by: Shreyas Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08net: rfs: add hash collision detectionEric Dumazet
Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08net: fix a typo in skb_checksum_validate_zero_checkSabrina Dubroca
Remove trailing underscore. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08gre/ipip: use be16 variants of netlink functionsSabrina Dubroca
encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead of nla_{get,put}_u16. Fixes the sparse warnings: warning: incorrect type in assignment (different base types) expected restricted __be32 [addressable] [usertype] o_key got restricted __be16 [addressable] [usertype] i_flags warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] sport got unsigned short warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] dport got unsigned short warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] sport warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] dport Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08SUNRPC: Set SO_REUSEPORT socket option for TCP connectionsTrond Myklebust
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08ACPI / video: Add disable_native_backlight quirk for Samsung 510RHans de Goede
Backlight control through the native intel interface does not work properly on the Samsung 510R, where as using the acpi_video interface does work, add a quirk for this. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1186097 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08ACPI / PM: Remove unneeded nested #ifdefAndreas Ruprecht
In commit 5de21bb998b8 ("ACPI / PM: Drop CONFIG_PM_RUNTIME from the ACPI core"), all occurrences of CONFIG_PM_RUNTIME were replaced with CONFIG_PM. This created the following structure of #ifdef blocks in the code: [...] #ifdef CONFIG_PM #ifdef CONFIG_PM /* always on / undead */ #ifdef CONFIG_PM_SLEEP [...] #endif #endif [...] #endif This patch removes the inner "#ifdef CONFIG_PM" block as it will always be enabled when the outer block is enabled. This inconsistency was found using the undertaker-checkpatch tool. Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08USB / PM: Remove unneeded #ifdef and associated dead codeAndreas Ruprecht
In commit ceb6c9c862c8 ("USB / PM: Drop CONFIG_PM_RUNTIME from the USB core"), all occurrences of CONFIG_PM_RUNTIME in the USB core code were replaced by CONFIG_PM. This created the following structure of #ifdef blocks in drivers/usb/core/hub.c: [...] #ifdef CONFIG_PM #ifdef CONFIG_PM /* always on / undead */ #else /* dead */ #endif [...] This patch removes unnecessary inner "#ifdef CONFIG_PM" as well as the corresponding dead #else block. This inconsistency was found using the undertaker-checkpatch tool. Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08tipc: fix bug in socket reception functionJon Paul Maloy
In commit c637c1035534867b85b78b453c38c495b58e2c5a ("tipc: resolve race problem at unicast message reception") we introduced a time limit for how long the function tipc_sk_eneque() would be allowed to execute its loop. Unfortunately, the test for when this limit is passed was put in the wrong place, resulting in a lost message when the test is true. We fix this by moving the test to before we dequeue the next buffer from the input queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08rt6_probe_deferred: Do not depend on struct orderingMichael Büsch
rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08Merge tag 'nfs-rdma-for-3.20-part-2' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: RDMA Client Sparse Fixes This patch fixes a sparse warning in the initial submission. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Address sparse complaint in rpcr_to_rdmar()
2015-02-08ALSA: control: fix failure to return numerical ID in 'add' eventTakashi Sakamoto
Currently when adding a new control, the assigned numerical ID is not set for event data, thus userspace applications cannot realize it just by event data. Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-02-08Merge branch 'tcp_ack_loops'David S. Miller
Neal Cardwellsays: ==================== tcp: mitigate TCP ACK loops due to out-of-window validation dupacks This patch series mitigates "ack loop" DoS scenarios by rate-limiting outgoing duplicate ACKs sent in response to incoming "out of window" segments. Background ----------- There are several cases in which the TCP RFCs specify that a TCP endpoint should send a pure duplicate ACK in response to a pure duplicate ACK that appears to be invalid due to being "out of window": (1) RFC 793 (section 3.9, page 69) specifies that endpoints should send a duplicate ACK in response to an ACK when the incoming sequence number is invalid due to being outside the receive window: "If an incoming segment is not acceptable, an acknowledgment should be sent in reply". (2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK". (3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should send a duplicate ACK in response to an ACK when the PAWS check for the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent and if TS.Recent is valid ... Send an acknowledgement in reply" The problem ------------ Normally, this is not a problem. However, a buggy middlebox or malicious man-in-the-middle can inject a few packets into the conversation that advance each endpoint's notion of the current window (sequence, ACK, or timestamp), without either side noticing. In this case, from then on each side can think the other is sending invalid segments. Thus an infinite feedback loop of duplicate ACKs can ensue, as each endpoint receives a duplicate ACK, decides that it is invalid (due to sequence number, ACK number, or timestamp), and then sends a dupack in reply, which the other side decides is invalid, responding with a dupack... ad infinitum. This ping-pong feedback loop can happen at a very high rate. This phenomenon can and does happen in practice. It has been seen in datacenter and Internet contexts at Google, and has been documented by Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread "Invalid timestamp? causing tight ack loop (hundreds of thousands of packets / sec)". This patch series ------------------ This patch series mitigates such ack loops by rate-limiting outgoing duplicate ACKs sent in response to incoming TCP packets that are for an existing connection but that are invalid due to any of the reasons mentioned above: sequence number (1), ACK field (2), or timestamp value (3). The rate limit for such duplicate ACKs is specified by a new sysctl, tcp_invalid_ratelimit, which specifies the minimal space between such outbound duplicate ACKs, in milliseconds. The default is 500 (500ms), and 0 disables the mechanism. We rate-limit these duplicate ACK responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. Testing: this approach has been in use at Google for a while. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_timewait_sockNeal Cardwell
Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_sockNeal Cardwell
Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>