summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2014-09-30ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idleRafael J. Wysocki
The ACPI GPE wakeup from suspend-to-idle is currently based on using the IRQF_NO_SUSPEND flag for the ACPI SCI, but that is problematic for a couple of reasons. First, in principle the ACPI SCI may be shared and IRQF_NO_SUSPEND does not really work well with shared interrupts. Second, it may require the ACPI subsystem to special-case the handling of device notifications depending on whether or not they are received during suspend-to-idle in some places which would lead to fragile code. Finally, it's better the handle ACPI wakeup interrupts consistently with wakeup interrupts from other sources. For this reason, remove the IRQF_NO_SUSPEND flag from the ACPI SCI and use enable_irq_wake()/disable_irq_wake() with it instead, which requires two additional platform hooks to be added to struct platform_freeze_ops. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()Rafael J. Wysocki
Subsequent change sets will add platform-related operations between dpm_suspend_late() and dpm_suspend_noirq() as well as between dpm_resume_noirq() and dpm_resume_early() in suspend_enter(), so export these functions for suspend_enter() to be able to call them separately and split the invocations of dpm_suspend_end() and dpm_resume_start() in there accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30Merge branch 'acpica' into acpi-pmRafael J. Wysocki
2014-09-30ACPICA: Introduce acpi_enable_all_wakeup_gpes()Rafael J. Wysocki
Add a routine for host OSes to enable all wakeup GPEs and disable all of the non-wakeup ones at the same time. It will be used for the handling of GPE wakeup from suspend-to-idle in Linux. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30Merge branch 'pm-genirq' into acpi-pmRafael J. Wysocki
2014-09-30ipv6: remove rt6i_genidHannes Frederic Sowa
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30bcma: register bcma as device tree driverHauke Mehrtens
This driver is used by the bcm53xx ARM SoC code. Now it is possible to give the address of the chipcommon core in device tree and bcma will search for all the other cores. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-30of/pci: Add pci_register_io_range() and pci_pio_to_address()Liviu Dudau
Some architectures do not have a simple view of the PCI I/O space and instead use a range of CPU addresses that map to bus addresses. For some architectures these ranges will be expressed by OF bindings in a device tree file. This patch introduces a pci_register_io_range() helper function with a generic implementation that can be used by such architectures to keep track of the I/O ranges described by the PCI bindings. If the PCI_IOBASE macro is not defined, that signals lack of support for PCI and we return an error. In order to retrieve the CPU address associated with an I/O port, a new helper function pci_pio_to_address() is introduced. This will search in the list of ranges registered with pci_register_io_range() and return the CPU address that corresponds to the given port. [arnd: add dummy !CONFIG_OF pci_pio_to_address() to fix build errors] Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Rob Herring <robh@kernel.org> CC: Grant Likely <grant.likely@linaro.org>
2014-09-30asm-generic/io.h: Fix ioport_map() for !CONFIG_GENERIC_IOMAPLiviu Dudau
The !CONFIG_GENERIC_IOMAP version of ioport_map() is wrong. It returns a mapped, i.e., virtual, address that can start from zero and completely ignores the PCI_IOBASE and IO_SPACE_LIMIT that most architectures that use !CONFIG_GENERIC_MAP define. Tested-by: Tanmay Inamdar <tinamdar@apm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
2014-10-01Merge commit 'v3.16' into nextJames Morris
2014-09-30scsi: fix comment in struct Scsi_Host definitionSebastian Herbszt
Commit 1abf635 (scsi: use 64-bit value for 'max_luns') changed the order of Scsi_Host members. Update the comment to reflect this. Signed-off-by: Sebastian Herbszt <herbszt@gmx.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-30Merge remote-tracking branches 'regulator/topic/tps65217', ↵Mark Brown
'regulator/topic/tps65910' and 'regulator/topic/voltage-ev' into regulator-next
2014-09-30Merge remote-tracking branches 'regulator/topic/rk808', ↵Mark Brown
'regulator/topic/rn5t618' and 'regulator/topic/samsung' into regulator-next
2014-09-30Merge remote-tracking branches 'regulator/topic/max1586', ↵Mark Brown
'regulator/topic/max77802' and 'regulator/topic/of' into regulator-next
2014-09-30Merge remote-tracking branches 'regulator/topic/drivers', ↵Mark Brown
'regulator/topic/enable', 'regulator/topic/fan53555', 'regulator/topic/hi6421' and 'regulator/topic/isl9305' into regulator-next
2014-09-30Merge remote-tracking branches 'regulator/topic/as3711', ↵Mark Brown
'regulator/topic/axp20x', 'regulator/topic/bcm590xx' and 'regulator/topic/da9211' into regulator-next
2014-09-30drm/ttm: add reservation_object as argument to ttm_bo_initMaarten Lankhorst
This allows importing reservation objects from dma-bufs. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
2014-09-30drm: Pass dma-buf as argument to gem_prime_import_sg_tableMaarten Lankhorst
Allows importing dma_reservation_objects from a dma-buf. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
2014-09-29Merge tag 'for_3.18/samsung-clk' of ↵Mike Turquette
git://git.kernel.org/pub/scm/linux/kernel/git/tfiga/samsung-clk into clk-next Samsung clock patches for v3.18 1) non-critical fixes (without the need to push to stable) fa0111be4ff3 clk: samsung: exynos4: remove duplicate div_core2 divider clock instantiation b511593d7165 clk: samsung: exynos4: fix g3d clocks c14254300131 clk: samsung: exynos4: add missing smmu_g2d clock and update comments 22842d244af3 clk: samsung: exynos5260: fix typo in clock name e82ba578ccde clk: samsung: exynos3250: fix width field of mout_mmc0/1 59037b92f440 clk: samsung: exynos3250: fix width and shift of div_spi0_isp clock 5ce37f266650 clk: samsung: exynos3250: fix mout_cam_blk parent list 2) Clock driver extensions 07ccf02ba5c3 dt-bindings: clk: samsung: Document the DMC domain of Exynos3250 CMU d0e73eaf1925 ARM: dts: exynos3250: Add CMU node for DMC domain clocks e3c3f19bc618 clk: samsung: exynos3250: Register DMC clk provider 4676f0aab9dc clk: samsung: exynos4: add support for MOUT_HDMI and MOUT_MIXER clocks
2014-09-29Merge branch 'for-v3.18/ti-clk-driver' of github.com:t-kristo/linux-pm into ↵Mike Turquette
clk-next
2014-09-30i2c: cros_ec: Remove EC_I2C_FLAG_10BITDoug Anderson
In <https://lkml.org/lkml/2014/6/10/265> pointed out that the 10-bit flag in the cros_ec_tunnel was useless. It went into a 16-bit flags field but was defined at (1 << 16). Since we have no 10-bit i2c devices on the other side of the tunnel on any known devices this was never a problem. Until we do it makes sense to remove this code. On the EC side the code to handle this flag was removed in <https://chromium-review.googlesource.com/204162>. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Simon Glass <sjg@chromium.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-09-30net: sched: enable per cpu qstatsJohn Fastabend
After previous patches to simplify qstats the qstats can be made per cpu with a packed union in Qdisc struct. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30net: sched: restrict use of qstats qlenJohn Fastabend
This removes the use of qstats->qlen variable from the classifiers and makes it an explicit argument to gnet_stats_copy_queue(). The qlen represents the qdisc queue length and is packed into the qstats at the last moment before passnig to user space. By handling it explicitely we avoid, in the percpu stats case, having to figure out which per_cpu variable to put it in. It would probably be best to remove it from qstats completely but qstats is a user space ABI and can't be broken. A future patch could make an internal only qstats structure that would avoid having to allocate an additional u32 variable on the Qdisc struct. This would make the qstats struct 128bits instead of 128+32. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30net: sched: implement qstat helper routinesJohn Fastabend
This adds helpers to manipulate qstats logic and replaces locations that touch the counters directly. This simplifies future patches to push qstats onto per cpu counters. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30net: sched: make bstats per cpu and estimator RCU safeJohn Fastabend
In order to run qdisc's without locking statistics and estimators need to be handled correctly. To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag. Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29tty: serial: 8250: use 32bit variable for rpm_tx_activeSebastian Andrzej Siewior
The kbuild test robot wrote me: | make.cross ARCH=powerpc |>> ERROR: ".__xchg_called_with_bad_pointer" [drivers/tty/serial/8250/8250.ko] undefined! The generic implementation of xchg() on arm and x86 works for variables of size one bye (char). According to the report powerpc does not support xchg() for one byte sized variables and looking further it seems also to be the same case for sparc and tile (or for 10 out of 26 architectures which provide a custom implementation). For that reason I increase the size of the variable from one to four bytes to get it work on powerpc (and the others). Reported-By: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-30PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.hUlf Hansson
The commit 46420dd73b80 (PM / Domains: Add APIs to attach/detach a PM domain for a device) started using errno values in pm.h header file. It also failed to include the header for these, thus it caused compiler errors. Instead of including the errno header to pm.h, let's move the functions to pm_domain.h, since it's a better match. Fixes: 46420dd73b80 (PM / Domains: Add APIs to attach/detach a PM domain for a device) Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29i2c: designware: add support of platform data to set I2C modeTan, Raymond
Use the platform data to set the clk_freq when there is no DT configuration available. The clk_freq in turn will determine the I2C speed mode. In Quark, there is currently no other configuration mechanism other than board files. Signed-off-by: Raymond Tan <raymond.tan@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Hock Leong Kweh <hock.leong.kweh@intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-09-29macvlan: add source modeMichael Braun
This patch adds a new mode of operation to macvlan, called "source". It allows one to set a list of allowed mac address, which is used to match against source mac address from received frames on underlying interface. This enables creating mac based VLAN associations, instead of standard port or tag based. The feature is useful to deploy 802.1x mac based behavior, where drivers of underlying interfaces doesn't allows that. Configuration is done through the netlink interface using e.g.: ip link add link eth0 name macvlan0 type macvlan mode source ip link add link eth0 name macvlan1 type macvlan mode source ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11 ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22 ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33 ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33 ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44 This allows clients with MAC addresses 00:11:11:11:11:11, 00:22:22:22:22:22 to be part of only VLAN associated with macvlan0 interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN associated with macvlan1 interface. And client with MAC address 00:33:33:33:33:33 to be associated with both VLANs. Based on work of Stefan Gula <steweg@gmail.com> v8: last version of Stefan Gula for Kernel 3.2.1 v9: rework onto linux-next 2014-03-12 by Michael Braun add MACADDR_SET command, enable to configure mac for source mode while creating interface v10: - reduce indention level - rename source_list to source_entry - use aligned 64bit ether address - use hash_64 instead of addr[5] v11: - rebase for 3.14 / linux-next 20.04.2014 v12 - rebase for linux-next 2014-09-25 Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29ASoC: rt5677: Add dts properties for input/output differential configurationAnatol Pomozov
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-29tcp: move TCP_ECN_create_request out of headerFlorian Westphal
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only has a single callsite. While at it, convert name to lowercase, suggested by Stephen. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29ARCNET: add support for multi interfaces on com20020Michael Grzeschik
The com20020-pci driver is currently designed to instance one netdev with one pci device. This patch adds support to instance many cards with one pci device, depending on the device data in the private data. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29ARCNET: add com20020 PCI IDs with metadataMichael Grzeschik
This patch adds metadata for the com20020 to prepare for devices with multiple io address areas with multi card interfaces. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29NFSD: Implement SEEKAnna Schumaker
This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-29NFSD: Add generic v4.2 infrastructureAnna Schumaker
It's cleaner to introduce everything at once and have the server reply with "not supported" than it would be to introduce extra operations when implementing a specific one in the middle of the list. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-29ASoC: Remove CODEC pointer from snd_soc_dapm_contextLars-Peter Clausen
The only remaining user of the CODEC pointer in the DAPM struct is to initialize the CODEC pointer in the widget struct. The later is scheduled for removal, but has still a few users left. For now use dapm->component->codec to initialize it. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-29net: reorganize sk_buff for faster __copy_skb_header()Eric Dumazet
With proliferation of bit fields in sk_buff, __copy_skb_header() became quite expensive, showing as the most expensive function in a GSO workload. __copy_skb_header() performance is also critical for non GSO TCP operations, as it is used from skb_clone() This patch carefully moves all the fields that were not copied in a separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more Then I moved all other fields and all other copied fields in a section delimited by headers_start[0]/headers_end[0] section so that we can use a single memcpy() call, inlined by compiler using long word load/stores. I also tried to make all copies in the natural orders of sk_buff, to help hardware prefetching. I made sure sk_buff size did not change. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU typeWill Deacon
VFIO allows devices to be safely handed off to userspace by putting them behind an IOMMU configured to ensure DMA and interrupt isolation. This enables userspace KVM clients, such as kvmtool and qemu, to further map the device into a virtual machine. With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU translation services to the guest operating system, which are nested with the existing translation installed by VFIO. However, enabling this feature means that the IOMMU driver must be informed that the VFIO domain is being created for the purposes of nested translation. This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO type-1 driver. The new IOMMU type acts identically to the VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING attribute on its IOMMU domains. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-29iommu: introduce domain attribute for nesting IOMMUsWill Deacon
Some IOMMUs, such as the ARM SMMU, support two stages of translation. The idea behind such a scheme is to allow a guest operating system to use the IOMMU for DMA mappings in the first stage of translation, with the hypervisor then installing mappings in the second stage to provide isolation of the DMA to the physical range assigned to that virtual machine. In order to allow IOMMU domains to be used for second-stage translation, this patch adds a new iommu_attr (IOMMU_ATTR_NESTING) for setting second-stage domains prior to device attach. The attribute can also be queried to see if a domain is actually making use of nesting. Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-29mei: remove include to pci header from mei module filesTomas Winkler
Remove inclusion of linux/pci.h in mei layer however we need to include the headers that before got included implicitly from linux/pci.h. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29usb: hcd: add generic PHY supportSergei Shtylyov
Add the generic PHY support, analogous to the USB PHY support. Intended it to be used with the PCI EHCI/OHCI drivers and the xHCI platform driver. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29usb: rename phy to usb_phy in HCDAntoine Tenart
The USB PHY member of the HCD structure is renamed to 'usb_phy' and modifications are done in all drivers accessing it. This is in preparation to adding the generic PHY support. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> [Sergei: added missing 'drivers/usb/misc/lvstest.c' file, resolved rejects, updated changelog.] Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29Merge back earlier 'acpica' material for v3.18.Rafael J. Wysocki
2014-09-29netfilter: nf_tables: store and dump set policyArturo Borrero
We want to know in which cases the user explicitly sets the policy options. In that case, we also want to dump back the info. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29clk: ti: change clock init to use generic of_clk_initTero Kristo
Previously, the TI clock driver initialized all the clocks hierarchically under each separate clock provider node. Now, each clock that requires IO access will instead check their parent node to find out which IO range to use. This patch allows the TI clock driver to use a few new features provided by the generic of_clk_init, and also allows registration of clock nodes outside the clock hierarchy (for example, any external clocks.) Signed-off-by: Tero Kristo <t-kristo@ti.com> Cc: Mike Turquette <mturquette@linaro.org> Cc: Paul Walmsley <paul@pwsan.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: Jyri Sarha <jsarha@ti.com> Cc: Stefan Assmann <sassmann@kpanic.de> Acked-by: Tony Lindgren <tony@atomide.com>
2014-09-29net: tcp: add DCTCP congestion control algorithmDaniel Borkmann
This work adds the DataCenter TCP (DCTCP) congestion control algorithm [1], which has been first published at SIGCOMM 2010 [2], resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more recently as an informational IETF draft available at [4]). DCTCP is an enhancement to the TCP congestion control algorithm for data center networks. Typical data center workloads are i.e. i) partition/aggregate (queries; bursty, delay sensitive), ii) short messages e.g. 50KB-1MB (for coordination and control state; delay sensitive), and iii) large flows e.g. 1MB-100MB (data update; throughput sensitive). DCTCP has therefore been designed for such environments to provide/achieve the following three requirements: * High burst tolerance (incast due to partition/aggregate) * Low latency (short flows, queries) * High throughput (continuous data updates, large file transfers) with commodity, shallow buffered switches The basic idea of its design consists of two fundamentals: i) on the switch side, packets are being marked when its internal queue length > threshold K (K is chosen so that a large enough headroom for marked traffic is still available in the switch queue); ii) the sender/host side maintains a moving average of the fraction of marked packets, so each RTT, F is being updated as follows: F := X / Y, where X is # of marked ACKs, Y is total # of ACKs alpha := (1 - g) * alpha + g * F, where g is a smoothing constant The resulting alpha (iow: probability that switch queue is congested) is then being used in order to adaptively decrease the congestion window W: W := (1 - (alpha / 2)) * W The means for receiving marked packets resp. marking them on switch side in DCTCP is the use of ECN. RFC3168 describes a mechanism for using Explicit Congestion Notification from the switch for early detection of congestion, rather than waiting for segment loss to occur. However, this method only detects the presence of congestion, not the *extent*. In the presence of mild congestion, it reduces the TCP congestion window too aggressively and unnecessarily affects the throughput of long flows [4]. DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN) processing to estimate the fraction of bytes that encounter congestion, rather than simply detecting that some congestion has occurred. DCTCP then scales the TCP congestion window based on this estimate [4], thus it can derive multibit feedback from the information present in the single-bit sequence of marks in its control law. And thus act in *proportion* to the extent of congestion, not its *presence*. Switches therefore set the Congestion Experienced (CE) codepoint in packets when internal queue lengths exceed threshold K. Resulting, DCTCP delivers the same or better throughput than normal TCP, while using 90% less buffer space. It was found in [2] that DCTCP enables the applications to handle 10x the current background traffic, without impacting foreground traffic. Moreover, a 10x increase in foreground traffic did not cause any timeouts, and thus largely eliminates TCP incast collapse problems. The algorithm itself has already seen deployments in large production data centers since then. We did a long-term stress-test and analysis in a data center, short summary of our TCP incast tests with iperf compared to cubic: This test measured DCTCP throughput and latency and compared it with CUBIC throughput and latency for an incast scenario. In this test, 19 senders sent at maximum rate to a single receiver. The receiver simply ran iperf -s. The senders ran iperf -c <receiver> -t 30. All senders started simultaneously (using local clocks synchronized by ntp). This test was repeated multiple times. Below shows the results from a single test. Other tests are similar. (DCTCP results were extremely consistent, CUBIC results show some variance induced by the TCP timeouts that CUBIC encountered.) For this test, we report statistics on the number of TCP timeouts, flow throughput, and traffic latency. 1) Timeouts (total over all flows, and per flow summaries): CUBIC DCTCP Total 3227 25 Mean 169.842 1.316 Median 183 1 Max 207 5 Min 123 0 Stddev 28.991 1.600 Timeout data is taken by measuring the net change in netstat -s "other TCP timeouts" reported. As a result, the timeout measurements above are not restricted to the test traffic, and we believe that it is likely that all of the "DCTCP timeouts" are actually timeouts for non-test traffic. We report them nevertheless. CUBIC will also include some non-test timeouts, but they are drawfed by bona fide test traffic timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing TCP timeouts. DCTCP reduces timeouts by at least two orders of magnitude and may well have eliminated them in this scenario. 2) Throughput (per flow in Mbps): CUBIC DCTCP Mean 521.684 521.895 Median 464 523 Max 776 527 Min 403 519 Stddev 105.891 2.601 Fairness 0.962 0.999 Throughput data was simply the average throughput for each flow reported by iperf. By avoiding TCP timeouts, DCTCP is able to achieve much better per-flow results. In CUBIC, many flows experience TCP timeouts which makes flow throughput unpredictable and unfair. DCTCP, on the other hand, provides very clean predictable throughput without incurring TCP timeouts. Thus, the standard deviation of CUBIC throughput is dramatically higher than the standard deviation of DCTCP throughput. Mean throughput is nearly identical because even though cubic flows suffer TCP timeouts, other flows will step in and fill the unused bandwidth. Note that this test is something of a best case scenario for incast under CUBIC: it allows other flows to fill in for flows experiencing a timeout. Under situations where the receiver is issuing requests and then waiting for all flows to complete, flows cannot fill in for timed out flows and throughput will drop dramatically. 3) Latency (in ms): CUBIC DCTCP Mean 4.0088 0.04219 Median 4.055 0.0395 Max 4.2 0.085 Min 3.32 0.028 Stddev 0.1666 0.01064 Latency for each protocol was computed by running "ping -i 0.2 <receiver>" from a single sender to the receiver during the incast test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure that traffic traversed the DCTCP queue and was not dropped when the queue size was greater than the marking threshold. The summary statistics above are over all ping metrics measured between the single sender, receiver pair. The latency results for this test show a dramatic difference between CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer which incurs the maximum queue latency (more buffer memory will lead to high latency.) DCTCP, on the other hand, deliberately attempts to keep queue occupancy low. The result is a two orders of magnitude reduction of latency with DCTCP - even with a switch with relatively little RAM. Switches with larger amounts of RAM will incur increasing amounts of latency for CUBIC, but not for DCTCP. 4) Convergence and stability test: This test measured the time that DCTCP took to fairly redistribute bandwidth when a new flow commences. It also measured DCTCP's ability to remain stable at a fair bandwidth distribution. DCTCP is compared with CUBIC for this test. At the commencement of this test, a single flow is sending at maximum rate (near 10 Gbps) to a single receiver. One second after that first flow commences, a new flow from a distinct server begins sending to the same receiver as the first flow. After the second flow has sent data for 10 seconds, the second flow is terminated. The first flow sends for an additional second. Ideally, the bandwidth would be evenly shared as soon as the second flow starts, and recover as soon as it stops. The results of this test are shown below. Note that the flow bandwidth for the two flows was measured near the same time, but not simultaneously. DCTCP performs nearly perfectly within the measurement limitations of this test: bandwidth is quickly distributed fairly between the two flows, remains stable throughout the duration of the test, and recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth fairly, and has trouble remaining stable. CUBIC DCTCP Seconds Flow 1 Flow 2 Seconds Flow 1 Flow 2 0 9.93 0 0 9.92 0 0.5 9.87 0 0.5 9.86 0 1 8.73 2.25 1 6.46 4.88 1.5 7.29 2.8 1.5 4.9 4.99 2 6.96 3.1 2 4.92 4.94 2.5 6.67 3.34 2.5 4.93 5 3 6.39 3.57 3 4.92 4.99 3.5 6.24 3.75 3.5 4.94 4.74 4 6 3.94 4 5.34 4.71 4.5 5.88 4.09 4.5 4.99 4.97 5 5.27 4.98 5 4.83 5.01 5.5 4.93 5.04 5.5 4.89 4.99 6 4.9 4.99 6 4.92 5.04 6.5 4.93 5.1 6.5 4.91 4.97 7 4.28 5.8 7 4.97 4.97 7.5 4.62 4.91 7.5 4.99 4.82 8 5.05 4.45 8 5.16 4.76 8.5 5.93 4.09 8.5 4.94 4.98 9 5.73 4.2 9 4.92 5.02 9.5 5.62 4.32 9.5 4.87 5.03 10 6.12 3.2 10 4.91 5.01 10.5 6.91 3.11 10.5 4.87 5.04 11 8.48 0 11 8.49 4.94 11.5 9.87 0 11.5 9.9 0 SYN/ACK ECT test: This test demonstrates the importance of ECT on SYN and SYN-ACK packets by measuring the connection probability in the presence of competing flows for a DCTCP connection attempt *without* ECT in the SYN packet. The test was repeated five times for each number of competing flows. Competing Flows 1 | 2 | 4 | 8 | 16 ------------------------------ Mean Connection Probability 1 | 0.67 | 0.45 | 0.28 | 0 Median Connection Probability 1 | 0.65 | 0.45 | 0.25 | 0 As the number of competing flows moves beyond 1, the connection probability drops rapidly. Enabling DCTCP with this patch requires the following steps: DCTCP must be running both on the sender and receiver side in your data center, i.e.: sysctl -w net.ipv4.tcp_congestion_control=dctcp Also, ECN functionality must be enabled on all switches in your data center for DCTCP to work. The default ECN marking threshold (K) heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at 1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]). In above tests, for each switch port, traffic was segregated into two queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of 0x04 - the packet was placed into the DCTCP queue. All other packets were placed into the default drop-tail queue. For the DCTCP queue, RED/ECN marking was enabled, here, with a marking threshold of 75 KB. More details however, we refer you to the paper [2] under section 3). There are no code changes required to applications running in user space. DCTCP has been implemented in full *isolation* of the rest of the TCP code as its own congestion control module, so that it can run without a need to expose code to the core of the TCP stack, and thus nothing changes for non-DCTCP users. Changes in the CA framework code are minimal, and DCTCP algorithm operates on mechanisms that are already available in most Silicon. The gain (dctcp_shift_g) is currently a fixed constant (1/16) from the paper, but we leave the option that it can be chosen carefully to a different value by the user. In case DCTCP is being used and ECN support on peer site is off, DCTCP falls back after 3WHS to operate in normal TCP Reno mode. ss {-4,-6} -t -i diag interface: ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054 ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584 send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15 reordering:101 rcv_space:29200 ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448 cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate 325.5Mbps rcv_rtt:1.5 rcv_space:29200 More information about DCTCP can be found in [1-4]. [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Joint work with Florian Westphal and Glenn Judd. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29net: tcp: more detailed ACK events and events for CE marked packetsFlorian Westphal
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information and ACK properties, e.g. ACK that updates window is treated differently than DUPACK. Also DCTCP needs information whether ACK was delayed ACK. Furthermore, DCTCP also implements a CE state machine that keeps track of CE markings of incoming packets. Therefore, extend the congestion control framework to provide these event types, so that DCTCP can be properly implemented as a normal congestion algorithm module outside of the core stack. Joint work with Daniel Borkmann and Glenn Judd. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29net: tcp: split ack slow/fast events from cwnd_eventFlorian Westphal
The congestion control ops "cwnd_event" currently supports CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others). Both FAST and SLOW_ACK are only used by Westwood congestion control algorithm. This removes both flags from cwnd_event and adds a new in_ack_event callback for this. The goal is to be able to provide more detailed information about ACKs, such as whether ECE flag was set, or whether the ACK resulted in a window update. It is required for DataCenter TCP (DCTCP) congestion control algorithm as it makes a different choice depending on ECE being set or not. Joint work with Daniel Borkmann and Glenn Judd. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29net: tcp: add flag for ca to indicate that ECN is requiredDaniel Borkmann
This patch adds a flag to TCP congestion algorithms that allows for requesting to mark IPv4/IPv6 sockets with transport as ECN capable, that is, ECT(0), when required by a congestion algorithm. It is currently used and needed in DataCenter TCP (DCTCP), as it requires both peers to assert ECT on all IP packets sent - it uses ECN feedback (i.e. CE, Congestion Encountered information) from switches inside the data center to derive feedback to the end hosts. Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's algorithm/behaviour slightly diverges from RFC3168, therefore this is only (!) enabled iff the assigned congestion control ops module has requested this. By that, we can tightly couple this logic really only to the provided congestion control ops. Joint work with Florian Westphal and Glenn Judd. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>