summaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2014-12-15IB/mlx5: Add function to read WQE from user-spaceHaggai Eran
Add a helper function mlx5_ib_read_user_wqe to read information from user-space owned work queues. The function will be used in a later patch by the page-fault handling code in mlx5_ib. Signed-off-by: Haggai Eran <haggaie@mellanox.com> [ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/mlx5: Enhance UMR support to allow partial page table updateHaggai Eran
The current UMR interface doesn't allow partial updates to a memory region's page tables. This patch changes the interface to allow that. It also changes the way the UMR operation validates the memory region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR operation to fail if the MKEY is in the free state. When it is unchecked the operation will check that it isn't in the free state. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15Merge tag 'staging-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver updates from Greg KH: "Here's the big staging tree pull request for 3.19-rc1. We continued to delete more lines than were added, always a good thing, but not at a huge rate this release, only about 70k lines removed overall mostly from removing the horrid bcm driver. Lots of normal staging driver cleanups and fixes all over the place, well over a thousand of them, the shortlog shows all the horrid details. The "contentious" thing here is the movement of the Android binder code out of staging into the "real" part of the kernel. This is code that has been stable for a few years now and is working as-is in the tens of millions of devices with no issues. Yes, the code is horrid, and the userspace api leaves a lot to be desired, but it's not going to change due to legacy issues that we have no control over. Because so many devices and companies rely on this, and the code is stable, might as well promote it out of staging. This was all discussed at the Linux Plumbers conference, and everyone participating agreed that this was the best way forward. There is work happening to replace the binder code with something new that is happening right now, but I don't expect to see the results of that work for another year at the earliest. If that ever happens, and Android switches over to it, I'll gladly remove this version. As for maintainers, I'll be glad to maintain this code, I've been doing it for the past few years with no problems. I'll send a MAINTAINERS entry for it before 3.19-final is out, still need to talk to the Google developers about if they are willing to help with it or not, last I checked they were, which was good. All of these patches have been in linux-next for a while with no reported issues" * tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1382 commits) Staging: slicoss: Fix long line issues in slicoss.c staging: rtl8712: remove unnecessary else after return staging: comedi: change some printk calls to pr_err staging: rtl8723au: hal: Removed the extra semicolon lustre: Deletion of unnecessary checks before three function calls staging: lustre: fix sparse warnings: static function declaration staging: lustre: fixed sparse warnings related to static declarations staging: unisys: remove duplicate header staging: unisys: remove unneeded structure staging: ft1000 : replace __attribute ((__packed__) with __packed drivers: staging: rtl8192e: Include "asm/unaligned.h" instead of "access_ok.h" in "rtl819x_BAProc.c" Drivers:staging:rtl8192e: Fixed checkpatch warning Drivers:staging:clocking-wizard: Added a newline staging: clocking-wizard: check for a valid clk_name pointer staging: rtl8723au: Hal_InitPGData() avoid unnecessary typecasts staging: rtl8723au: _DisableAnalog(): Avoid zero-init variables unnecessarily staging: rtl8723au: Remove unnecessary wrapper _ResetDigitalProcedure1() staging: rtl8723au: _ResetDigitalProcedure1_92C() reduce code obfuscation staging: rtl8723au: Remove unnecessary wrapper _DisableRFAFEAndResetBB() staging: rtl8723au: _DisableRFAFEAndResetBB8192C(): Reduce code obfuscation ...
2014-12-15Merge branch 'irq-irqdomain-arm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq domain ARM updates from Thomas Gleixner: "This set of changes make use of hierarchical irqdomains to provide: - MSI/ITS support for GICv3 - MSI support for GICv2m - Interrupt polarity extender for GICv1 Marc has come more cleanups for the existing extension hooks of GIC in the pipeline, but they are going to be 3.20 material" * 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) irqchip: gicv3-its: Fix ITT allocation irqchip: gicv3-its: Move some alloc/free code to activate/deactivate irqchip: gicv3-its: Fix domain free in multi-MSI case irqchip: gic: Remove warning by including linux/irqdomain.h irqchip: gic-v2m: Add DT bindings for GICv2m irqchip: gic-v2m: Add support for ARM GICv2m MSI(-X) doorbell irqchip: mtk-sysirq: dt-bindings: Add bindings for mediatek sysirq irqchip: mtk-sysirq: Add sysirq interrupt polarity support irqchip: gic: Support hierarchy irq domain. irqchip: GICv3: Binding updates for ITS irqchip: GICv3: ITS: enable compilation of the ITS driver irqchip: GICv3: ITS: plug ITS init into main GICv3 code irqchip: GICv3: ITS: DT probing and initialization irqchip: GICv3: ITS: MSI support irqchip: GICv3: ITS: device allocation and configuration irqchip: GICv3: ITS: tables allocators irqchip: GICv3: ITS: LPI allocator irqchip: GICv3: ITS: irqchip implementation irqchip: GICv3: ITS command queue irqchip: GICv3: rework redistributor structure ...
2014-12-15ARM: OMAP2+: clock: fix DPLL code to use new determine rate APIsTero Kristo
While the change for determine_rate clock operation was merged, the OMAP counterpart using these calls was overlooked for some reason, and caused boot failures on at least OMAP4 platforms. Fixed by updating the DPLL API calls to use the new parameters. Signed-off-by: Tero Kristo <t-kristo@ti.com> Fixes: 646cafc6aa ("clk: Change clk_ops->determine_rate") Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul Walmsley <paul@pwsan.com> Tested-by: Kevin Hilman <khilman@linaro.org> Reported-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Michael Turquette <mturquette@linaro.org>
2014-12-15Merge tag 'for-v3.19/omap-a' of ↵Michael Turquette
git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending into tmp Some OMAP clock/hwmod patches for v3.19. Most of the patches are clock-related. The DPLL implementation is changed to better align to the common clock framework. There is also a patch that removes a few lines from the hwmod code - this patch should have no functional effect. Basic build, boot, and PM test logs for these patches can be found here: http://www.pwsan.com/omap/testlogs/omap-a-for-v3.19/20141113094101/
2014-12-15Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - AMD KFD driver merge This is the AMD HSA interface for exposing a lowlevel interface for GPGPU use. They have an open source userspace built on top of this interface, and the code looks as good as it was going to get out of tree. - Initial atomic modesetting work The need for an atomic modesetting interface to allow userspace to try and send a complete set of modesetting state to the driver has arisen, and been suffering from neglect this past year. No more, the start of the common code and changes for msm driver to use it are in this tree. Ongoing work to get the userspace ioctl finished and the code clean will probably wait until next kernel. - DisplayID 1.3 and tiled monitor exposed to userspace. Tiled monitor property is now exposed for userspace to make use of. - Rockchip drm driver merged. - imx gpu driver moved out of staging Other stuff: - core: panel - MIPI DSI + new panels. expose suggested x/y properties for virtual GPUs - i915: Initial Skylake (SKL) support gen3/4 reset work start of dri1/ums removal infoframe tracking fixes for lots of things. - nouveau: tegra k1 voltage support GM204 modesetting support GT21x memory reclocking work - radeon: CI dpm fixes GPUVM improvements Initial DPM fan control - rcar-du: HDMI support added removed some support for old boards slave encoder driver for Analog Devices adv7511 - exynos: Exynos4415 SoC support - msm: a4xx gpu support atomic helper conversion - tegra: iommu support universal plane support ganged-mode DSI support - sti: HDMI i2c improvements - vmwgfx: some late fixes. - qxl: use suggested x/y properties" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits) drm: sti: fix module compilation issue drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm: sti: correctly cleanup CRTC and planes drm: sti: add HQVDP plane drm: sti: add cursor plane drm: sti: enable auxiliary CRTC drm: sti: fix delay in VTG programming drm: sti: prepare sti_tvout to support auxiliary crtc drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off} drm: sti: fix hdmi avi infoframe drm: sti: remove event lock while disabling vblank drm: sti: simplify gdp code drm: sti: clear all mixer control drm: sti: remove gpio for HDMI hot plug detection drm: sti: allow to change hdmi ddc i2c adapter drm/doc: Document drm_add_modes_noedid() usage drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm: Zero out DRM object memory upon cleanup drm/i915/bdw: Fix the write setting up the WIZ hashing mode ...
2014-12-15vringh: update for virtio 1.0 APIsMichael S. Tsirkin
When switching everything over to virtio 1.0 memory access APIs, I missed converting vringh. Fortunately, it's straight-forward. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-15vringh: 64 bit featuresMichael S. Tsirkin
Pass u64 everywhere. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-15tracing: Add tp_printk cmdline to have tracepoints go to printk()Steven Rostedt (Red Hat)
Add the kernel command line tp_printk option that will have tracepoints that are active sent to printk() as well as to the trace buffer. Passing "tp_printk" will activate this. To turn it off, the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note, this only works if the cmdline option is used. Echoing 1 into the sysctl file without the cmdline option will have no affect. Note, this is a dangerous option. Having high frequency tracepoints send their data to printk() can possibly cause a live lock. This is another reason why this is only active if the command line option is used. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15tracing: Move enabling tracepoints to just after rcu_init()Steven Rostedt (Red Hat)
Enabling tracepoints at boot up can be very useful. The tracepoint can be initialized right after RCU has been. There's no need to wait for the early_initcall() to be called. That's too late for some things that can use tracepoints for debugging. Move the logic to enable tracepoints out of the initcalls and into init/main.c to right after rcu_init(). This also allows trace_printk() to be used early too. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15Merge tag 'kvm-arm-for-3.19-take2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot problems, clarifies VCPU init, and fixes a regression concerning the VGIC init flow. Conflicts: arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
2014-12-14Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: "In terms of changes, there's general maintenance to the Smack, SELinux, and integrity code. The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT, which allows IMA appraisal to require signatures. Support for reading keys from rootfs before init is call is also added" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits) selinux: Remove security_ops extern security: smack: fix out-of-bounds access in smk_parse_smack() VFS: refactor vfs_read() ima: require signature based appraisal integrity: provide a hook to load keys when rootfs is ready ima: load x509 certificate from the kernel integrity: provide a function to load x509 certificate from the kernel integrity: define a new function integrity_read_file() Security: smack: replace kzalloc with kmem_cache for inode_smack Smack: Lock mode for the floor and hat labels ima: added support for new kernel cmdline parameter ima_template_fmt ima: allocate field pointers array on demand in template_desc_init_fields() ima: don't allocate a copy of template_fmt in template_desc_init_fields() ima: display template format in meas. list if template name length is zero ima: added error messages to template-related functions ima: use atomic bit operations to protect policy update interface ima: ignore empty and with whitespaces policy lines ima: no need to allocate entry for comment ima: report policy load status ima: use path names cache ...
2014-12-14Merge tag 'char-misc-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here's the big char/misc driver update for 3.19-rc1 Lots of little things all over the place in different drivers, and a new subsystem, "coresight" has been added. Full details are in the shortlog" * tag 'char-misc-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (73 commits) parport: parport_pc, do not remove parent devices early spmi: Remove shutdown/suspend/resume kernel-doc carma-fpga-program: drop videobuf dependency carma-fpga: drop videobuf dependency carma-fpga-program.c: fix compile errors i8k: Fix temperature bug handling in i8k_get_temp() cxl: Name interrupts in /proc/interrupt CXL: Return error to PSL if IRQ demultiplexing fails & print clearer warning coresight-replicator: remove .owner field for driver coresight: fixed comments in coresight.h coresight: fix typo in comment in coresight-priv.h coresight: bindings for coresight drivers coresight: Adding ABI documentation w1: support auto-load of w1_bq27000 module. w1: avoid potential u16 overflow cn: verify msg->len before making callback mei: export fw status registers through sysfs mei: read and print all six FW status registers mei: txe: add cherrytrail device id mei: kill cached host and me csr values ...
2014-12-14Merge tag 'driver-core-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...
2014-12-14Merge tag 'tty-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here's the big tty/serial driver update for 3.19-rc1. There are a number of TTY core changes/fixes in here from Peter Hurley that have all been teted in linux-next for a long time now. There are also the normal serial driver updates as well, full details in the changelog below" * tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits) serial: pxa: hold port.lock when reporting modem line changes tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put" tty: Deletion of unnecessary checks before two function calls n_tty: Fix read_buf race condition, increment read_head after pushing data serial: of-serial: add PM suspend/resume support Revert "serial: of-serial: add PM suspend/resume support" Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type" serial: 8250: don't attempt a trylock if in sysrq serial: core: Add big-endian iotype serial: samsung: use port->fifosize instead of hardcoded values serial: samsung: prefer to use fifosize from driver data serial: samsung: fix style problems serial: samsung: wait for transfer completion before clock disable serial: icom: fix error return code serial: tegra: clean up tty-flag assignments serial: Fix io address assign flow with Fintek PCI-to-UART Product serial: mxs-auart: fix tx_empty against shift register serial: mxs-auart: fix gpio change detection on interrupt serial: mxs-auart: Fix mxs_auart_set_ldisc() serial: 8250_dw: Use 64-bit access for OCTEON. ...
2014-12-14Merge tag 'usb-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB updates from Greg KH: "Here's the big set of USB and PHY patches for 3.19-rc1. The normal churn in the USB gadget area is in here, as well as xhci and other individual USB driver updates. The PHY tree is also in here, as there were dependancies on the USB tree. All of these have been in linux-next" * tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (351 commits) arm: omap3: twl: remove usb phy init data usbip: fix error handling in stub_probe() usb: gadget: udc: missing curly braces USB: mos7720: delete some unneeded code wusb: replace memset by memzero_explicit usbip: remove unneeded structure usb: xhci: fix comment for PORT_DEV_REMOVE xhci: don't use the same variable for stopped and halted rings current TD xhci: clear extra bits from slot context when setting max exit latency xhci: cleanup finish_td function USB: adutux: NULL dereferences on disconnect usb: chipidea: fix platform_no_drv_owner.cocci warnings usb: chipidea: Fixed a few typos in comments Documentation: bindings: add doc for the USB2 ChipIdea USB driver usb: chipidea: add a usb2 driver for ci13xxx usb: chipidea: fix phy handling usb: chipidea: remove duplicate dev_set_drvdata for host_start usb: chipidea: parameter 'mode' isn't needed for hw_device_reset usb: chipidea: add controller reset API usb: chipidea: remove flag CI_HDRC_REQUIRE_TRANSCEIVER ...
2014-12-14Merge tag 'gpio-v3.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull take two of the GPIO updates: "Same stuff as last time, now with a fixup patch for the previous compile error plus I ran a few extra rounds of compile-testing. This is the bulk of GPIO changes for the v3.19 series: - A new API that allows setting more than one GPIO at the time. This is implemented for the new descriptor-based API only and makes it possible to e.g. toggle a clock and data line at the same time, if the hardware can do this with a single register write. Both consumers and drivers need new calls, and the core will fall back to driving individual lines where needed. Implemented for the MPC8xxx driver initially - Patched the mdio-mux-gpio and the serial mctrl driver that drives modems to use the new multiple-setting API to set several signals simultaneously - Get rid of the global GPIO descriptor array, and instead allocate descriptors dynamically for each GPIO on a certain GPIO chip. This moves us closer to getting rid of the limitation of using the global, static GPIO numberspace - New driver and device tree bindings for 74xx ICs - New driver and device tree bindings for the VF610 Vybrid - Support the RCAR r8a7793 and r8a7794 - Guidelines for GPIO device tree bindings trying to get things a bit more strict with the advent of combined device properties - Suspend/resume support for the MVEBU driver - A slew of minor fixes and improvements" * tag 'gpio-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (33 commits) gpio: mcp23s08: fix up compilation error gpio: pl061: document gpio-ranges property for bindings file gpio: pl061: hook request if gpio-ranges avaiable gpio: mcp23s08: Add option to configure IRQ output polarity as active high gpio: fix deferred probe detection for legacy API serial: mctrl_gpio: use gpiod_set_array function mdio-mux-gpio: Use GPIO descriptor interface and new gpiod_set_array function gpio: remove const modifier from gpiod_get_direction() gpio: remove gpio_descs global array gpio: mxs: implement get_direction callback gpio: em: Use dynamic allocation of GPIOs gpio: Check if base is positive before calling gpio_is_valid() gpio: mcp23s08: Add simple IRQ support for SPI devices gpio: mcp23s08: request a shared interrupt gpio: mcp23s08: Do not free unrequested interrupt gpio: rcar: Add r8a7793 and r8a7794 support gpio-mpc8xxx: add mpc8xxx_gpio_set_multiple function gpiolib: allow simultaneous setting of multiple GPIO outputs gpio: mvebu: add suspend/resume support gpio: gpio-davinci: remove duplicate check on resource ..
2014-12-14Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio updates from Benjamin LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: Skip timer for io_getevents if timeout=0 aio: Make it possible to remap aio ring
2014-12-14Merge branch 'i2c/for-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "For 3.19, the I2C subsystem has to offer special candy this time. Right in time for Christmas :) - I2C slave framework: finally, a generic mechanism for Linux being an I2C slave (if the bus driver supports that). Docs are still missing but will come later this cycle, the code is good enough to go. - I2C muxes represent their topology in sysfs much more detailed. This will help users to navigate around much easier. - irq population of i2c clients is now done at probe time, not device creation time, to have better support for deferred probing. - new drivers for Imagination SCB, Amlogic Meson - DMA support added for Freescale IMX, Renesas SHMobile - slightly bigger driver updates to OMAP, i801, AT91, and rk3x (mostly quirk handling, timing updates, and using better kernel interfaces) - eeprom driver can now write with byte-access (very slow, but OK to have) - and the bunch of smaller fixes, cleanups, ID updates..." * 'i2c/for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (56 commits) i2c: sh_mobile: remove unneeded DMA mask i2c: rcar: add slave support i2c: slave-eeprom: add eeprom simulator driver i2c: core changes for slave support MAINTAINERS: add I2C dt bindings also to I2C realm i2c: designware: Fix falling time bindings doc i2c: davinci: switch to use platform_get_irq Documentation: i2c: Use PM ops instead of legacy suspend/resume i2c: sh_mobile: optimize irq entry i2c: pxa: add support for SCCB devices omap: i2c: don't check bus state IP rev3.3 and earlier i2c: s3c2410: Handle i2c sys_cfg register in i2c driver i2c: rk3x: add Kconfig dependency on COMMON_CLK i2c: omap: add notes related to i2c multimaster mode i2c: omap: don't reset controller if Arbitration Lost detected i2c: omap: implement workaround for handling invalid BB-bit values i2c: omap: cleanup register definitions i2c: rk3x: handle dynamic clock rate changes correctly i2c: at91: enable probe deferring on dma channel request i2c: at91: remove legacy DMA support ...
2014-12-14virtio: core support for config generationMichael S. Tsirkin
virtio 1.0 spec says: Drivers MUST NOT assume reads from fields greater than 32 bits wide are atomic, nor are reads from multiple fields: drivers SHOULD read device configuration space fields like so: u32 before, after; do { before = get_config_generation(device); // read config entry/entries. after = get_config_generation(device); } while (after != before); Do exactly this, for transports that support it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-13aio: Make it possible to remap aio ringPavel Emelyanov
There are actually two issues this patch addresses. Let me start with the one I tried to solve in the beginning. So, in the checkpoint-restore project (criu) we try to dump tasks' state and restore one back exactly as it was. One of the tasks' state bits is rings set up with io_setup() call. There's (almost) no problems in dumping them, there's a problem restoring them -- if I dump a task with aio ring originally mapped at address A, I want to restore one back at exactly the same address A. Unfortunately, the io_setup() does not allow for that -- it mmaps the ring at whatever place mm finds appropriate (it calls do_mmap_pgoff() with zero address and without the MAP_FIXED flag). To make restore possible I'm going to mremap() the freshly created ring into the address A (under which it was seen before dump). The problem is that the ring's virtual address is passed back to the user-space as the context ID and this ID is then used as search key by all the other io_foo() calls. Reworking this ID to be just some integer doesn't seem to work, as this value is already used by libaio as a pointer using which this library accesses memory for aio meta-data. So, to make restore work we need to make sure that a) ring is mapped at desired virtual address b) kioctx->user_id matches this value Having said that, the patch makes mremap() on aio region update the kioctx's user_id and mmap_base values. Here appears the 2nd issue I mentioned in the beginning of this mail. If (regardless of the C/R dances I do) someone creates an io context with io_setup(), then mremap()-s the ring and then destroys the context, the kill_ioctx() routine will call munmap() on wrong (old) address. This will result in a) aio ring remaining in memory and b) some other vma get unexpectedly unmapped. What do you think? Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-12-13Merge branch 'for-3.19/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer driver updates from Jens Axboe: - NVMe updates: - The blk-mq conversion from Matias (and others) - A stack of NVMe bug fixes from the nvme tree, mostly from Keith. - Various bug fixes from me, fixing issues in both the blk-mq conversion and generic bugs. - Abort and CPU online fix from Sam. - Hot add/remove fix from Indraneel. - A couple of drbd fixes from the drbd team (Andreas, Lars, Philipp) - With the generic IO stat accounting from 3.19/core, converting md, bcache, and rsxx to use those. From Gu Zheng. - Boundary check for queue/irq mode for null_blk from Matias. Fixes cases where invalid values could be given, causing the device to hang. - The xen blkfront pull request, with two bug fixes from Vitaly. * 'for-3.19/drivers' of git://git.kernel.dk/linux-block: (56 commits) NVMe: fix race condition in nvme_submit_sync_cmd() NVMe: fix retry/error logic in nvme_queue_rq() NVMe: Fix FS mount issue (hot-remove followed by hot-add) NVMe: fix error return checking from blk_mq_alloc_request() NVMe: fix freeing of wrong request in abort path xen/blkfront: remove redundant flush_op xen/blkfront: improve protection against issuing unsupported REQ_FUA NVMe: Fix command setup on IO retry null_blk: boundary check queue_mode and irqmode block/rsxx: use generic io stats accounting functions to simplify io stat accounting md: use generic io stats accounting functions to simplify io stat accounting drbd: use generic io stats accounting functions to simplify io stat accounting md/bcache: use generic io stats accounting functions to simplify io stat accounting NVMe: Update module version major number NVMe: fail pci initialization if the device doesn't have any BARs NVMe: add ->exit_hctx() hook NVMe: make setup work for devices that don't do INTx NVMe: enable IO stats by default NVMe: nvme_submit_async_admin_req() must use atomic rq allocation NVMe: replace blk_put_request() with blk_mq_free_request() ...
2014-12-13Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...
2014-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - The crypto API is now documented :) - Disallow arbitrary module loading through crypto API. - Allow get request with empty driver name through crypto_user. - Allow speed testing of arbitrary hash functions. - Add caam support for ctr(aes), gcm(aes) and their derivatives. - nx now supports concurrent hashing properly. - Add sahara support for SHA1/256. - Add ARM64 version of CRC32. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: tcrypt - Allow speed testing of arbitrary hash functions crypto: af_alg - add user space interface for AEAD crypto: qat - fix problem with coalescing enable logic crypto: sahara - add support for SHA1/256 crypto: sahara - replace tasklets with kthread crypto: sahara - add support for i.MX53 crypto: sahara - fix spinlock initialization crypto: arm - replace memset by memzero_explicit crypto: powerpc - replace memset by memzero_explicit crypto: sha - replace memset by memzero_explicit crypto: sparc - replace memset by memzero_explicit crypto: algif_skcipher - initialize upon init request crypto: algif_skcipher - removed unneeded code crypto: algif_skcipher - Fixed blocking recvmsg crypto: drbg - use memzero_explicit() for clearing sensitive data crypto: drbg - use MODULE_ALIAS_CRYPTO crypto: include crypto- module prefix in template crypto: user - add MODULE_ALIAS crypto: sha-mb - remove a bogus NULL check crytpo: qat - Fix 64 bytes requests ...
2014-12-13Merge branch 'akpm' (second patch-bomb from Andrew)Linus Torvalds
Merge second patchbomb from Andrew Morton: - the rest of MM - misc fs fixes - add execveat() syscall - new ratelimit feature for fault-injection - decompressor updates - ipc/ updates - fallocate feature creep - fsnotify cleanups - a few other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits) cgroups: Documentation: fix trivial typos and wrong paragraph numberings parisc: percpu: update comments referring to __get_cpu_var percpu: update local_ops.txt to reflect this_cpu operations percpu: remove __get_cpu_var and __raw_get_cpu_var macros fsnotify: remove destroy_list from fsnotify_mark fsnotify: unify inode and mount marks handling fallocate: create FAN_MODIFY and IN_MODIFY events mm/cma: make kmemleak ignore CMA regions slub: fix cpuset check in get_any_partial slab: fix cpuset check in fallback_alloc shmdt: use i_size_read() instead of ->i_size ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ipc/msg: increase MSGMNI, remove scaling ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() lib/decompress.c: consistency of compress formats for kernel image decompress_bunzip2: off by one in get_next_block() usr/Kconfig: make initrd compression algorithm selection not expert fault-inject: add ratelimit option ratelimit: add initialization macro ...
2014-12-13percpu: remove __get_cpu_var and __raw_get_cpu_var macrosChristoph Lameter
No user is left in the kernel source tree. Therefore we can drop the definitions. This is the final merge of the transition away from __get_cpu_var. After this patch the kernel will not build if anyone uses __get_cpu_var. Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: remove destroy_list from fsnotify_markJan Kara
destroy_list is used to track marks which still need waiting for srcu period end before they can be freed. However by the time mark is added to destroy_list it isn't in group's list of marks anymore and thus we can reuse fsnotify_mark->g_list for queueing into destroy_list. This saves two pointers for each fsnotify_mark. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: unify inode and mount marks handlingJan Kara
There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13ipc/msg: increase MSGMNI, remove scalingManfred Spraul
SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: increase MSGMNI to the maximum supported. And: If we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore the logic can be removed. The code preserves auto_msgmni to avoid breaking any user space applications that expect that the value exists. Notes: 1) If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done by Android). 2) MSGMAX and MSGMNB are intentionally not increased, as these values are used to control latency vs. throughput: If MSGMNB is large, then msgsnd() just returns and more messages can be queued before a task switch to a task that calls msgrcv() is forced. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fault-inject: add ratelimit optionDmitry Monakhov
Current debug levels are not optimal. Especially if one want to provoke big numbers of faults(broken device simulator) then any verbose level will produce giant numbers of identical logging messages. Let's add ratelimit parameter for that purpose. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13ratelimit: add initialization macroDmitry Monakhov
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13syscalls: implement execveat() system callDavid Drysdale
This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13memcg: fix possible use-after-free in memcg_kmem_get_cache()Vladimir Davydov
Suppose task @t that belongs to a memory cgroup @memcg is going to allocate an object from a kmem cache @c. The copy of @c corresponding to @memcg, @mc, is empty. Then if kmem_cache_alloc races with the memory cgroup destruction we can access the memory cgroup's copy of the cache after it was destroyed: CPU0 CPU1 ---- ---- [ current=@t @mc->memcg_params->nr_pages=0 ] kmem_cache_alloc(@c): call memcg_kmem_get_cache(@c); proceed to allocation from @mc: alloc a page for @mc: ... move @t from @memcg destroy @memcg: mem_cgroup_css_offline(@memcg): memcg_unregister_all_caches(@memcg): kmem_cache_destroy(@mc) add page to @mc We could fix this issue by taking a reference to a per-memcg cache, but that would require adding a per-cpu reference counter to per-memcg caches, which would look cumbersome. Instead, let's take a reference to a memory cgroup, which already has a per-cpu reference counter, in the beginning of kmem_cache_alloc to be dropped in the end, and move per memcg caches destruction from css offline to css free. As a side effect, per-memcg caches will be destroyed not one by one, but all at once when the last page accounted to the memory cgroup is freed. This doesn't sound as a high price for code readability though. Note, this patch does add some overhead to the kmem_cache_alloc hot path, but it is pretty negligible - it's just a function call plus a per cpu counter decrement, which is comparable to what we already have in memcg_kmem_get_cache. Besides, it's only relevant if there are memory cgroups with kmem accounting enabled. I don't think we can find a way to handle this race w/o it, because alloc_page called from kmem_cache_alloc may sleep so we can't flush all pending kmallocs w/o reference counting. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13oom: don't assume that a coredumping thread will exit soonOleg Nesterov
oom_kill.c assumes that PF_EXITING task should exit and free the memory soon. This is wrong in many ways and one important case is the coredump. A task can sleep in exit_mm() "forever" while the coredumping sub-thread can need more memory. Change the PF_EXITING checks to take SIGNAL_GROUP_COREDUMP into account, we add the new trivial helper for that. Note: this is only the first step, this patch doesn't try to solve other problems. The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can participate in coredump after it was already observed in PF_EXITING state, so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set. fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it. And even the name/usage of the new helper is confusing, an exiting thread can only free its ->mm if it is the only/last task in thread group. [akpm@linux-foundation.org: add comment] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm: vmscan: invoke slab shrinkers from shrink_zone()Johannes Weiner
The slab shrinkers are currently invoked from the zonelist walkers in kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the eligible LRU pages and assemble a nodemask to pass to NUMA-aware shrinkers, which then again have to walk over the nodemask. This is redundant code, extra runtime work, and fairly inaccurate when it comes to the estimation of actually scannable LRU pages. The code duplication will only get worse when making the shrinkers cgroup-aware and requiring them to have out-of-band cgroup hierarchy walks as well. Instead, invoke the shrinkers from shrink_zone(), which is where all reclaimers end up, to avoid this duplication. Take the count for eligible LRU pages out of get_scan_count(), which considers many more factors than just the availability of swap space, like zone_reclaimable_pages() currently does. Accumulate the number over all visited lruvecs to get the per-zone value. Some nodes have multiple zones due to memory addressing restrictions. To avoid putting too much pressure on the shrinkers, only invoke them once for each such node, using the class zone of the allocation as the pivot zone. For now, this integrates the slab shrinking better into the reclaim logic and gets rid of duplicative invocations from kswapd, direct reclaim, and zone reclaim. It also prepares for cgroup-awareness, allowing memcg-capable shrinkers to be added at the lruvec level without much duplication of both code and runtime work. This changes kswapd behavior, which used to invoke the shrinkers for each zone, but with scan ratios gathered from the entire node, resulting in meaningless pressure quantities on multi-zone nodes. Zone reclaim behavior also changes. It used to shrink slabs until the same amount of pages were shrunk as were reclaimed from the LRUs. Now it merely invokes the shrinkers once with the zone's scan ratio, which makes the shrinkers go easier on caches that implement aging and would prefer feeding back pressure from recently used slab objects to unused LRU pages. [vdavydov@parallels.com: assure class zone is populated] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm,vmacache: count number of system-wide flushesDavidlohr Bueso
These flushes deal with sequence number overflows, such as for long lived threads. These are rare, but interesting from a debugging PoV. As such, display the number of flushes when vmacache debugging is enabled. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/page_owner: keep track of page ownersJoonsoo Kim
This is the page owner tracking code which is introduced so far ago. It is resident on Andrew's tree, though, nobody tried to upstream so it remain as is. Our company uses this feature actively to debug memory leak or to find a memory hogger so I decide to upstream this feature. This functionality help us to know who allocates the page. When allocating a page, we store some information about allocation in extra memory. Later, if we need to know status of all pages, we can get and analyze it from this stored information. In previous version of this feature, extra memory is statically defined in struct page, but, in this version, extra memory is allocated outside of struct page. It enables us to turn on/off this feature at boottime without considerable memory waste. Although we already have tracepoint for tracing page allocation/free, using it to analyze page owner is rather complex. We need to enlarge the trace buffer for preventing overlapping until userspace program launched. And, launched program continually dump out the trace buffer for later analysis and it would change system behaviour with more possibility rather than just keeping it in memory, so bad for debug. Moreover, we can use page_owner feature further for various purposes. For example, we can use it for fragmentation statistics implemented in this patch. And, I also plan to implement some CMA failure debugging feature using this interface. I'd like to give the credit for all developers contributed this feature, but, it's not easy because I don't know exact history. Sorry about that. Below is people who has "Signed-off-by" in the patches in Andrew's tree. Contributor: Alexander Nyberg <alexn@dsv.su.se> Mel Gorman <mgorman@suse.de> Dave Hansen <dave@linux.vnet.ibm.com> Minchan Kim <minchan@kernel.org> Michal Nazarewicz <mina86@mina86.com> Andrew Morton <akpm@linux-foundation.org> Jungsoo Son <jungsoo.son@lge.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13stacktrace: introduce snprint_stack_trace for buffer outputJoonsoo Kim
Current stacktrace only have the function for console output. page_owner that will be introduced in following patch needs to print the output of stacktrace into the buffer for our own output format so so new function, snprint_stack_trace(), is needed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/debug-pagealloc: make debug-pagealloc boottime configurableJoonsoo Kim
Now, we have prepared to avoid using debug-pagealloc in boottime. So introduce new kernel-parameter to disable debug-pagealloc in boottime, and makes related functions to be disabled in this case. Only non-intuitive part is change of guard page functions. Because guard page is effective only if debug-pagealloc is enabled, turning off according to debug-pagealloc is reasonable thing to do. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/debug-pagealloc: prepare boottime configurable on/offJoonsoo Kim
Until now, debug-pagealloc needs extra flags in struct page, so we need to recompile whole source code when we decide to use it. This is really painful, because it takes some time to recompile and sometimes rebuild is not possible due to third party module depending on struct page. So, we can't use this good feature in many cases. Now, we have the page extension feature that allows us to insert extra flags to outside of struct page. This gets rid of third party module issue mentioned above. And, this allows us to determine if we need extra memory for this page extension in boottime. With these property, we can avoid using debug-pagealloc in boottime with low computational overhead in the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our development process greatly. This patch is the preparation step to achive above goal. debug-pagealloc originally uses extra field of struct page, but, after this patch, it will use field of struct page_ext. Because memory for page_ext is allocated later than initialization of page allocator in CONFIG_SPARSEMEM, we should disable debug-pagealloc feature temporarily until initialization of page_ext. This patch implements this. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/page_ext: resurrect struct page extending code for debuggingJoonsoo Kim
When we debug something, we'd like to insert some information to every page. For this purpose, we sometimes modify struct page itself. But, this has drawbacks. First, it requires re-compile. This makes us hesitate to use the powerful debug feature so development process is slowed down. And, second, sometimes it is impossible to rebuild the kernel due to third party module dependency. At third, system behaviour would be largely different after re-compile, because it changes size of struct page greatly and this structure is accessed by every part of kernel. Keeping this as it is would be better to reproduce errornous situation. This feature is intended to overcome above mentioned problems. This feature allocates memory for extended data per page in certain place rather than the struct page itself. This memory can be accessed by the accessor functions provided by this code. During the boot process, it checks whether allocation of huge chunk of memory is needed or not. If not, it avoids allocating memory at all. With this advantage, we can include this feature into the kernel in default and can avoid rebuild and solve related problems. Until now, memcg uses this technique. But, now, memcg decides to embed their variable to struct page itself and it's code to extend struct page has been removed. I'd like to use this code to develop debug feature, so this patch resurrect it. To help these things to work well, this patch introduces two callbacks for clients. One is the need callback which is mandatory if user wants to avoid useless memory allocation at boot-time. The other is optional, init callback, which is used to do proper initialization after memory is allocated. Detailed explanation about purpose of these functions is in code comment. Please refer it. Others are completely same with previous extension code in memcg. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm, gfp: escalatedly define GFP_HIGHUSER and GFP_HIGHUSER_MOVABLEJianyu Zhan
GFP_USER, GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE are escalatedly confined defined, also implied by their names: GFP_USER = GFP_USER GFP_USER + __GFP_HIGHMEM = GFP_HIGHUSER GFP_USER + __GFP_HIGHMEM + __GFP_MOVABLE = GFP_HIGHUSER_MOVABLE So just make GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE escalatedly defined to reflect this fact. It also makes the definition clear and texturally warn on any furture break-up of this escalated relastionship. Signed-off-by: Jianyu Zhan <jianyu.zhan@emc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13include/linux/kmemleak.h: needs slab.hAndrew Morton
include/linux/kmemleak.h: In function 'kmemleak_alloc_recursive': include/linux/kmemleak.h:43: error: 'SLAB_NOLEAKTRACE' undeclared (first use in this function) Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/memcontrol.c: remove the unused arg in __memcg_kmem_get_cache()Zhang Zhen
The gfp was passed in but never used in this function. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm: move swp_entry_t definition to include/linux/mm_types.hTejun Heo
swp_entry_t being defined in include/linux/swap.h instead of include/linux/mm_types.h causes cyclic include dependency later when include/linux/page_cgroup.h is included from writeback path. Move the definition to include/linux/mm_types.h. While at it, reformat the comment above it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13memcg: turn memcg_kmem_skip_account into a bit fieldVladimir Davydov
It isn't supposed to stack, so turn it into a bit-field to save 4 bytes on the task_struct. Also, remove the memcg_stop/resume_kmem_account helpers - it is clearer to set/clear the flag inline. Regarding the overwhelming comment to the helpers, which is removed by this patch too, we already have a compact yet accurate explanation in memcg_schedule_cache_create, no need in yet another one. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13lib: bitmap: add alignment offset for bitmap_find_next_zero_area()Michal Nazarewicz
Add a bitmap_find_next_zero_area_off() function which works like bitmap_find_next_zero_area() function except it allows an offset to be specified when alignment is checked. This lets caller request a bit such that its number plus the offset is aligned according to the mask. [gregory.0xf0@gmail.com: Retrieved from https://patchwork.linuxtv.org/patch/6254/ and updated documentation] Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/rmap: share the i_mmap_rwsemDavidlohr Bueso
Similarly to the anon memory counterpart, we can share the mapping's lock ownership as the interval tree is not modified when doing doing the walk, only the file page. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm: convert i_mmap_mutex to rwsemDavidlohr Bueso
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>