summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-11-01sysfs: use generic_file_llseek() for sysfs_file_operationsTejun Heo
13c589d5b0ac6 ("sysfs: use seq_file when reading regular files") converted regular sysfs files to use seq_file. The commit substituted generic_file_llseek() with seq_lseek() for llseek implementation. Before the change, all regular sysfs files were allowed to seek to any position in [0, PAGE_SIZE] as the file size is always PAGE_SIZE and generic_file_llseek() allows any seeking inside the range under file size; however, seq_lseek()'s behavior is different. It traverses the output by repeatedly invoking ->show() until it reaches the target offset or traversal indicates EOF. As seq_files are fully dynamic and may not end at all, it doesn't support seeking from the end (SEEK_END). Apparently, there are userland tools which uses SEEK_END to discover the buffer size to use and the switch to seq_lseek() disturbs them as SEEK_END fails with -EINVAL. The only benefits of using seq_lseek() instead of generic_file_llseek() are * Early failure. If traversing to certain file position should fail, seq_lseek() will report such failures on lseek(2) instead of the following read/write operations. * EOF detection. While SEEK_END is not supported, SEEK_SET/CUR + large offset can be used to detect eof - eof at the time of the seek anyway as the file size may change dynamically. Both aren't necessary for sysfs or prospect kernfs users. Revert to genefic_file_llseek() and preserve the original behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Link: https://lkml.kernel.org/r/20131031114358.GA5551@osiris Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-30sysfs: return correct error code on unimplemented mmap()Vladimir Zapolskiy
Both POSIX.1-2008 and Linux Programmer's Manual have a dedicated return error code for a case, when a file doesn't support mmap(), it's ENODEV. This change replaces overloaded EINVAL with ENODEV in a situation described above for sysfs binary files. Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29mdio_bus: convert bus code to use dev_groupsGreg Kroah-Hartman
The dev_attrs field of struct bus_type is going away soon, dev_groups should be used instead. This converts the MDIO bus code to use the correct field. Cc: David S. Miller <davem@davemloft.net> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: <netdev@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29device: Make dev_WARN/dev_WARN_ONCE print device as well as driver nameBjorn Helgaas
dev_WARN() and dev_WARN_ONCE() are annoying because (1) they include only the driver name, not the device name, and (2) they print a spurious newline in the middle. This results in messages like this that are less useful than they should be: [ 40.094995] Device pcieport disabling already-disabled device This patch makes them work more like dev_printk(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29sysfs: separate out dup filename warning into a separate functionTejun Heo
Separate out sysfs_warn_dup() out of sysfs_add_one(). This will help separating out the core sysfs functionalities into kernfs so that it can be used by non-sysfs users too. This doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.cTejun Heo
Most removal related logic is implemented in fs/sysfs/dir.c. Move sysfs_hash_and_remove() to fs/sysfs/dir.c so that __sysfs_remove() doesn't have to be public. This is pure relocation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29sysfs: remove unused sysfs_get_dentry() prototypeTejun Heo
sysfs_get_dentry() has been gone for years now. Remove the left-over prototype. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29sysfs: honor bin_attr.attr.ignore_lockdepTejun Heo
ignore_lockdep is currently honored only for regular files. There's no reason to ignore it for bin files. Update sysfs_ignore_lockdep() so that bin_attr.attr.ignore_lockdep works too. While this doesn't have any in-kernel user, this unifies the behaviors between regular and bin files and will help later changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attrTejun Heo
3124eb1679 ("sysfs: merge regular and bin file handling") folded bin file handling into regular file handling. Among other things, bin file now shares the same open path including sysfs_open_dirent association using sysfs_dirent->s_attr.open. This is buggy because ->s_bin_attr lives in the same union and doesn't have the field. This bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active field at the conflicting position. It does have a field "buffers" but it isn't used anymore. This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that the bin_attr is accessed through ->s_attr.bin_attr which lives with ->s_attr.attr in an anonymous union. The code paths already assume bin_attr contains attr as the first element, so this doesn't add any more assumptions while making it explicit that the two types are handled together. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-25devres: restore zeroing behavior of devres_alloc()Kevin Hilman
commit 64c862a8 (devres: add kernel standard devm_k.alloc functions) changed the default behavior of alloc_dr() to no longer zero the allocated memory. However, only the devm.k.alloc() function were modified to pass in __GFP_ZERO which leaves any users of devres_alloc() or __devres_alloc() with potentially wrong assumptions about memory being zero'd upon allocation. To fix, add __GFP_ZERO to devres_alloc() calls to preserve previous behavior of zero'ing memory upon allocation. Signed-off-by: Kevin Hilman <khilman@linaro.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-25sysfs: fix sysfs_write_file for bin fileMing Lei
Before patch(sysfs: prepare path write for unified regular / bin file handling), when size of bin file is zero, writting still can continue, but this patch changes the behaviour. The worse thing is that firmware loader is broken by this patch, and user space application can't write to firmware bin file any more because both firmware loader and drivers can't know at advance how large the firmware file is and have to set its initialized size as zero. This patch fixes the problem and keeps behaviour of writting to bin as before. Reported-by: Lothar Waßmann <LW@karo-electronics.de> Tested-by: Lothar Waßmann <LW@karo-electronics.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19input: gameport: convert bus code to use dev_groupsGreg Kroah-Hartman
The dev_attrs field of struct bus_type is going away soon, dev_groups should be used instead. This converts the gameport bus code to use the correct field. Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: <linux-input@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19input: serio: remove bus usage of dev_attrsGreg Kroah-Hartman
The dev_attrs field of struct bus_type is going away soon, so move the remaining sysfs files that are being described with this field to use dev_groups instead. Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: <linux-input@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19input: serio: use DEVICE_ATTR_RO()Greg Kroah-Hartman
Convert the serio sysfs fiels to use the DEVICE_ATTR_RO() macros to make it easier to audit the correct sysfs file permission usage. Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: <linux-input@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19Merge 3.12-rc6 into driver-core-nextGreg Kroah-Hartman
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19Linux 3.12-rc6v3.12-rc6Linus Torvalds
2013-10-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a regression in our initial rc1 pull. When doing nocow writes we were sometimes starting a transaction with locks held" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: release path before starting transaction in can_nocow_extent
2013-10-18Merge tag 'pm+acpi-3.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - intel_pstate fix for misbehavior after system resume if sysfs attributes are set in a specific way before the corresponding suspend from Dirk Brandewie. - A recent intel_pstate fix has no effect if unsigned long is 32-bit, so fix it up to cover that case as well. - The s3c64xx cpufreq driver was not updated when the index field of struct cpufreq_frequency_table was replaced with driver_data, so update it now. From Charles Keepax. - The Kconfig help text for ACPI_BUTTON still refers to /proc/acpi/event that has been dropped recently, so modify it to remove that reference. From Krzysztof Mazur. - A Lan Tianyu's change adds a missing mutex unlock to an error code path in acpi_resume_power_resources(). - Some code related to ACPI power resources, whose very purpose is questionable to put it lightly, turns out to cause problems to happen during testing on real systems, so remove it completely (we may revisit that in the future if there's a compelling enough reason). From Rafael J Wysocki and Aaron Lu. * tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PM: Drop two functions that are not used any more ATA / ACPI: remove power dependent device handling cpufreq: s3c64xx: Rename index to driver_data ACPI / power: Drop automaitc resume of power resource dependent devices intel_pstate: Fix type mismatch warning cpufreq / intel_pstate: Fix max_perf_pct on resume ACPI: remove /proc/acpi/event from ACPI_BUTTON help ACPI / power: Release resource_lock after acpi_power_get_state() return error
2013-10-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixlets: - fix a (rare-config) build bug - fix a next-gen SGI/UV hw/firmware enumeration bug" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Update UV3 hub revision ID x86/microcode: Correct Kconfig dependencies
2013-10-18Btrfs: release path before starting transaction in can_nocow_extentJosef Bacik
We can't be holding tree locks while we try to start a transaction, we will deadlock. Thanks, Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-18Merge branch 'acpi-fixes'Rafael J. Wysocki
* acpi-fixes: ACPI / PM: Drop two functions that are not used any more ATA / ACPI: remove power dependent device handling ACPI / power: Drop automaitc resume of power resource dependent devices ACPI: remove /proc/acpi/event from ACPI_BUTTON help ACPI / power: Release resource_lock after acpi_power_get_state() return error
2013-10-18Merge branch 'pm-fixes'Rafael J. Wysocki
* pm-fixes: cpufreq: s3c64xx: Rename index to driver_data intel_pstate: Fix type mismatch warning cpufreq / intel_pstate: Fix max_perf_pct on resume
2013-10-17Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Five small cifs fixes (includes fixes for: unmount hang, 2 security related, symlink, large file writes)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: ntstatus_to_dos_map[] is not terminated cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods cifs: Fix inability to write files >2GB to SMB2/3 shares cifs: Avoid umount hangs with smb2 when server is unresponsive do not treat non-symlink reparse points as valid symlinks
2013-10-17Merge tag 'driver-core-3.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is one fix for the hotplug memory path that resolves a regression when removing memory that showed up in 3.12-rc1" * tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Release device_hotplug_lock when store_mem_state returns EINVAL
2013-10-17Merge tag 'usb-3.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some USB fixes and new device ids for 3.12-rc6 The largest change here is a bunch of new device ids for the option USB serial driver for new Huawei devices. Other than that, just some small bug fixes for issues that people have reported (run-time and build-time), nothing major" * tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register usb: misc: usb3503: Fix compile error due to incorrect regmap depedency usb/chipidea: fix oops on memory allocation failure usb-storage: add quirk for mandatory READ_CAPACITY_16 usb: serial: option: blacklist Olivetti Olicard200 USB: quirks: add touchscreen that is dazzeled by remote wakeup Revert "usb: musb: gadget: fix otg active status flag" USB: quirks.c: add one device that cannot deal with suspension USB: serial: option: add support for Inovia SEW858 device USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well. USB: support new huawei devices in option.c usb: musb: start musb on the udc side, too xhci: Fix spurious wakeups after S5 on Haswell xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers xhci: quirk for extra long delay for S4 xhci: Don't enable/disable RWE on bus suspend/resume.
2013-10-17Merge tag 'tty-3.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg KH: "Here are two serial driver fixes for your tree. One is a revert of a patch that causes a build error, the other is a fix to provide the correct brace placement which resolves a bug where the driver was not working properly" * tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: vt8500: add missing braces Revert "serial: i.MX: evaluate linux,stdout-path property"
2013-10-17Merge tag 'char-misc-3.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small iio and w1 driver fixes for 3.12-rc6. There is also a hyper-v fix in here, which turned out to be incorrect, so it was reverted. That will probably have to wait unto 3.13-rc1 to get accepted as it's still being discussed" * tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code" Drivers: hv: vmbus: Fix a bug in channel rescind code iio:buffer: Free active scan mask in iio_disable_all_buffers() iio: frequency: adf4350: add missing clk_disable_unprepare() on error in adf4350_probe() w1 - call request_module with w1 master mutex unlocked w1 - fix fops in w1_bus_notify
2013-10-17Merge tag 'sound-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap regression fix, and kernel memory leak fix in hdsp driver. Hopefully this will be the last pull request for 3.12..." * tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hdsp - info leak in snd_hdsp_hwdep_ioctl() ALSA: us122l: Fix pcm_usb_stream mmapping regression ALSA: hda - Fix inverted internal mic not indicated on some machines
2013-10-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull apparmor fixes from James Morris: "A couple more regressions fixed" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: apparmor: fix bad lock balance when introspecting policy apparmor: fix memleak of the profile hash
2013-10-17Merge tag 'iio-fixes-for-3.12c' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus Jonathan writes: Third set of IIO fixes for the 3.12 cycle. Two little ones this time: 1) A missing clk_unprepare in adf4350. 2) A missing free of the active_scan_mask when iio_disable_all_buffers is called during an unexpected device removal. This leak was introduced by the fix a87c82e454f184a9473f8cdfd4d304205f585f65 iio: Stop sampling when the device is removed and hence is a regression fix.
2013-10-17usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_registerGuenter Roeck
Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv) changed the conditional around the declaration of usb_nop_xceiv_register from #if defined(CONFIG_NOP_USB_XCEIV) || (defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE)) to #if IS_ENABLED(CONFIG_NOP_USB_XCEIV) While that looks the same, it is semantically different. The first expression is true if CONFIG_NOP_USB_XCEIV is built as module and if the including code is built as module. The second expression is true if code depending on CONFIG_NOP_USB_XCEIV if built as module or into the kernel. As a result, the arm:allmodconfig build fails with arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init': arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to `usb_nop_xceiv_register' Fix the problem by reverting to the old conditional. Cc: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-17Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"Greg Kroah-Hartman
This reverts commit 90d33f3ec519db19d785216299a4ee85ef58ec97 as it's not the correct fix for this issue, and it causes a build warning to be added to the kernel tree. Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-17ACPI / PM: Drop two functions that are not used any moreRafael J. Wysocki
Two functions defined in device_pm.c, acpi_dev_pm_add_dependent() and acpi_dev_pm_remove_dependent(), have no callers and may be dropped, so drop them. Moreover, they are the only functions adding entries to and removing entries from the power_dependent list in struct acpi_device, so drop that list too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-17ATA / ACPI: remove power dependent device handlingAaron Lu
Previously, we wanted SCSI devices corrsponding to ATA devices to be runtime resumed when the power resource for those ATA device was turned on by some other device, so we added the SCSI device to the dependent device list of the ATA device's ACPI node. However, this code has no effect after commit 41863fc (ACPI / power: Drop automaitc resume of power resource dependent devices) and the mechanism it was supposed to implement is regarded as a bad idea now, so drop it. [rjw: Changelog] Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm: revert mremap pud_free anti-fix mm: fix BUG in __split_huge_page_pmd swap: fix set_blocksize race during swapon/swapoff procfs: call default get_unmapped_area on MMU-present architectures procfs: fix unintended truncation of returned mapped address writeback: fix negative bdi max pause percpu_refcount: export symbols fs: buffer: move allocation failure loop into the allocator mm: memcg: handle non-error OOM situations more gracefully tools/testing/selftests: fix uninitialized variable block/partitions/efi.c: treat size mismatch as a warning, not an error mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages mm/zswap: bugfix: memory leak when re-swapon mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages mm: migration: do not lose soft dirty bit if page is in migration state gcov: MAINTAINERS: Add an entry for gcov mm/hugetlb.c: correct missing private flag clearing mm/vmscan.c: don't forget to free shrinker->nr_deferred ipc/sem.c: synchronize semop and semctl with IPC_RMID ipc: update locking scheme comments ...
2013-10-16mm: revert mremap pud_free anti-fixHugh Dickins
Revert commit 1ecfd533f4c5 ("mm/mremap.c: call pud_free() after fail calling pmd_alloc()"). The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map() ensure that the pud, pmd, pt is already allocated, and seldom do they need to allocate; on failure, upper levels are freed if appropriate by the subsequent do_munmap(). Whereas commit 1ecfd533f4c5 did an unconditional pud_free() of a most-likely still-in-use pud: saved only by the near-impossiblity of pmd_alloc() failing. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: fix BUG in __split_huge_page_pmdHugh Dickins
Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED). It's invalid: we don't always have down_write of mmap_sem there: a racing do_huge_pmd_wp_page() might have copied-on-write to another huge page before our split_huge_page() got the anon_vma lock. Forget the BUG_ON, just go back and try again if this happens. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16swap: fix set_blocksize race during swapon/swapoffKrzysztof Kozlowski
Fix race between swapoff and swapon. Swapoff used old_block_size from swap_info outside of swapon_mutex so it could be overwritten by concurrent swapon. The race has visible effect only if more than one swap block device exists with different block sizes (e.g. /dev/sda1 with block size 4096 and /dev/sdb1 with 512). In such case it leads to setting the blocksize of swapped off device with wrong blocksize. The bug can be triggered with multiple concurrent swapoff and swapon: 0. Swap for some device is on. 1. swapoff: First the swapoff is called on this device and "struct swap_info_struct *p" is assigned. This is done under swap_lock however this lock is released for the call try_to_unuse(). 2. swapon: After the assignment above (and before acquiring swapon_mutex & swap_lock by swapoff) the swapon is called on the same device. The p->old_block_size is assigned to the value of block_size the device. This block size should be the same as previous but sometimes it is not. The swapon ends successfully. 3. swapoff: Swapoff resumes, grabs the locks and mutex and continues to disable this swap device. Now it sets the block size to value taken from swap_info which was overwritten by swapon in 2. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reported-by: Weijie Yang <weijie.yang.kh@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16procfs: call default get_unmapped_area on MMU-present architecturesHATAYAMA Daisuke
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added proc_reg_get_unmapped_area in proc_reg_file_ops and proc_reg_file_ops_no_compat, by which now mmap always returns EIO if get_unmapped_area method is not defined for the target procfs file, which causes regression of mmap on /proc/vmcore. To address this issue, like get_unmapped_area(), call default current->mm->get_unmapped_area on MMU-present architectures if pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation in the procfs file, is not defined. Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16procfs: fix unintended truncation of returned mapped addressHATAYAMA Daisuke
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the mapped virtual address returned from get_unmapped_area method in pde->proc_fops due to the variable rv of signed integer on x86_64. This is too small to have vitual address of unsigned long on x86_64 since on x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes. To fix this issue, use unsigned long instead. Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)"). Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16writeback: fix negative bdi max pauseFengguang Wu
Toralf runs trinity on UML/i386. After some time it hangs and the last message line is BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] It's found that pages_dirtied becomes very large. More than 1000000000 pages in this case: period = HZ * pages_dirtied / task_ratelimit; BUG_ON(pages_dirtied > 2000000000); BUG_ON(pages_dirtied > 1000000000); <--------- UML debug printf shows that we got negative pause here: ick: pause : -984 ick: pages_dirtied : 0 ick: task_ratelimit: 0 pause: + if (pause < 0) { + extern int printf(char *, ...); + printf("ick : pause : %li\n", pause); + printf("ick: pages_dirtied : %lu\n", pages_dirtied); + printf("ick: task_ratelimit: %lu\n", task_ratelimit); + BUG_ON(1); + } trace_balance_dirty_pages(bdi, Since pause is bounded by [min_pause, max_pause] where min_pause is also bounded by max_pause. It's suspected and demonstrated that the max_pause calculation goes wrong: ick: pause : -717 ick: min_pause : -177 ick: max_pause : -717 ick: pages_dirtied : 14 ick: task_ratelimit: 0 The problem lies in the two "long = unsigned long" assignments in bdi_max_pause() which might go negative if the highest bit is 1, and the min_t(long, ...) check failed to protect it falling under 0. Fix all of them by using "unsigned long" throughout the function. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Richard Weinberger <richard@nod.at> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16percpu_refcount: export symbolsMatias Bjorling
Export the interface to be used within modules. Signed-off-by: Matias Bjorling <m@bjorling.me> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16fs: buffer: move allocation failure loop into the allocatorJohannes Weiner
Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: memcg: handle non-error OOM situations more gracefullyJohannes Weiner
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full callstack on OOM") assumed that only a few places that can trigger a memcg OOM situation do not return VM_FAULT_OOM, like optional page cache readahead. But there are many more and it's impractical to annotate them all. First of all, we don't want to invoke the OOM killer when the failed allocation is gracefully handled, so defer the actual kill to the end of the fault handling as well. This simplifies the code quite a bit for added bonus. Second, since a failed allocation might not be the abrupt end of the fault, the memcg OOM handler needs to be re-entrant until the fault finishes for subsequent allocation attempts. If an allocation is attempted after the task already OOMed, allow it to bypass the limit so that it can quickly finish the fault and invoke the OOM killer. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16tools/testing/selftests: fix uninitialized variableFelipe Pena
The err variable is intended to receive the timer_create() return before checking it Signed-off-by: Felipe Pena <felipensp@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16block/partitions/efi.c: treat size mismatch as a warning, not an errorDoug Anderson
In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba") we started treating bad sizes in lba field of the partition that has the 0xEE (GPT protective) as errors. However, we may run into these "bad sizes" in the real world if someone uses dd to copy an image from a smaller disk to a bigger disk. Since this case used to work (even without using force_gpt), keep it working and treat the size mismatch as a warning instead of an error. Reported-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pagesAndrea Arcangeli
Commit 11feeb498086 ("kvm: optimize away THP checks in kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic compound pages. That commit depends on the assumption that PG_reserved is identical for all head and tail pages of a compound page. So that if get_user_pages returns a tail page, we don't need to check the head page in order to know if we deal with a reserved page that requires different refcounting. The assumption that PG_reserved is the same for head and tail pages is certainly correct for THP and regular hugepages, but gigantic hugepages allocated through bootmem don't clear the PG_reserved on the tail pages (the clearing of PG_reserved is done later only if the gigantic hugepage is freed). This patch corrects the gigantic compound page initialization so that we can retain the optimization in 11feeb498086. The cacheline was already modified in order to set PG_tail so this won't affect the boot time of large memory systems. [akpm@linux-foundation.org: tweak comment layout and grammar] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: andy123 <ajs124.ajs124@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm/zswap: bugfix: memory leak when re-swaponWeijie Yang
zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon, so a memory leak occurs. Free the memory of zswap_tree in zswap_frontswap_invalidate_area(). Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> From: Weijie Yang <weijie.yang@samsung.com> Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently Consider the following scenario: thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page) thread 1: call zswap_frontswap_invalidate_page to invalidate entry x. finished, entry x and its zbud is not freed as its refcount != 0 now, the swap_map[x] = 0 thread 0: now call zswap_get_swap_cache_page swapcache_prepare return -ENOENT because entry x is not used any more zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM zswap_writeback_entry do nothing except put refcount Now, the memory of zswap_entry x and its zpage leak. Modify: - check the refcount in fail path, free memory if it is not referenced. - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path can be not only caused by nomem but also by invalidate. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pagesCyrill Gorcunov
If a page we are inspecting is in swap we may occasionally report it as having soft dirty bit (even if it is clean). The pte_soft_dirty helper should be called on present pte only. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: migration: do not lose soft dirty bit if page is in migration stateCyrill Gorcunov
If page migration is turned on in config and the page is migrating, we may lose the soft dirty bit. If fork and mprotect are called on migrating pages (once migration is complete) pages do not obtain the soft dirty bit in the correspond pte entries. Fix it adding an appropriate test on swap entries. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>