summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2012-12-12Merge tag 'please-pull-pstore_mevent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore fixes from Tony Luck: "Patch series to allow EFI variable backend to pstore to hold multiple records." * tag 'please-pull-pstore_mevent' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: efi_pstore: Add a format check for an existing variable name at erasing time efi_pstore: Add a format check for an existing variable name at reading time efi_pstore: Add a sequence counter to a variable name efi_pstore: Add ctime to argument of erase callback efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs efi_pstore: Add a logic erasing entries to an erase callback efi_pstore: Check remaining space with QueryVariableInfo() before writing data
2012-12-11Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change affects group scheduling: we now track the runnable average on a per-task entity basis, allowing a smoother, exponential decay average based load/weight estimation instead of the previous binary on-the-runqueue/off-the-runqueue load weight method. This will inevitably disturb workloads that were in some sort of borderline balancing state or unstable equilibrium, so an eye has to be kept on regressions. For that reason the new load average is only limited to group scheduling (shares distribution) at the moment (which was also hurting the most from the prior, crude weight calculation and whose scheduling quality wins most from this change) - but we plan to extend this to regular SMP balancing as well in the future, which will simplify and speed up things a bit. Other changes involve ongoing preparatory work to extend NOHZ to the scheduler as well, eventually allowing completely irq-free user-space execution." * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" cputime: Comment cputime's adjusting code cputime: Consolidate cputime adjustment code cputime: Rename thread_group_times to thread_group_cputime_adjusted cputime: Move thread_group_cputime() to sched code vtime: Warn if irqs aren't disabled on system time accounting APIs vtime: No need to disable irqs on vtime_account() vtime: Consolidate a bit the ctx switch code vtime: Explicitly account pending user time on process tick vtime: Remove the underscore prefix invasion sched/autogroup: Fix crash on reboot when autogroup is disabled cputime: Separate irqtime accounting from generic vtime cputime: Specialize irq vtime hooks kvm: Directly account vtime to system on guest switch vtime: Make vtime_account_system() irqsafe vtime: Gather vtime declarations to their own header file sched: Describe CFS load-balancer sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking sched: Make __update_entity_runnable_avg() fast sched: Update_cfs_shares at period edge ...
2012-12-11Merge branch 'akpm' (Andrew's patchbomb)Linus Torvalds
Merge misc updates from Andrew Morton: "About half of most of MM. Going very early this time due to uncertainty over the coreautounifiednumasched things. I'll send the other half of most of MM tomorrow. The rest of MM awaits a slab merge from Pekka." * emailed patches from Andrew Morton: (71 commits) memory_hotplug: ensure every online node has NORMAL memory memory_hotplug: handle empty zone when online_movable/online_kernel mm, memory-hotplug: dynamic configure movable memory and portion memory drivers/base/node.c: cleanup node_state_attr[] bootmem: fix wrong call parameter for free_bootmem() avr32, kconfig: remove HAVE_ARCH_BOOTMEM mm: cma: remove watermark hacks mm: cma: skip watermarks check for already isolated blocks in split_free_page() mm, oom: fix race when specifying a thread as the oom origin mm, oom: change type of oom_score_adj to short mm: cleanup register_node() mm, mempolicy: remove duplicate code mm/vmscan.c: try_to_freeze() returns boolean mm: introduce putback_movable_pages() virtio_balloon: introduce migration primitives to balloon pages mm: introduce compaction and migration for ballooned pages mm: introduce a common interface for balloon pages mobility mm: redefine address_space.assoc_mapping mm: adjust address_space_operations.migratepage() return code arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/ ...
2012-12-11mm, oom: change type of oom_score_adj to shortDavid Rientjes
The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000, so this range can be represented by the signed short type with no functional change. The extra space this frees up in struct signal_struct will be used for per-thread oom kill flags in the next patch. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: redefine address_space.assoc_mappingRafael Aquini
Overhaul struct address_space.assoc_mapping renaming it to address_space.private_data and its type is redefined to void*. By this approach we consistently name the .private_* elements from struct address_space as well as allow extended usage for address_space association with other data structures through ->private_data. Also, all users of old ->assoc_mapping element are converted to reflect its new name and type change (->private_data). Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: adjust address_space_operations.migratepage() return codeRafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch-set follows the main idea discussed at 2012 LSFMMS session: "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/ to introduce the required changes to the virtio_balloon driver, as well as the changes to the core compaction & migration bits, in order to make those subsystems aware of ballooned pages and allow memory balloon pages become movable within a guest, thus avoiding the aforementioned fragmentation issue Following are numbers that prove this patch benefits on allowing compaction to be more effective at memory ballooned guests. Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite, running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks, at every minute (inflating/deflating), while test was running: ===BEGIN stress-highalloc STRESS-HIGHALLOC highalloc-3.7 highalloc-3.7 rc4-clean rc4-patch Pass 1 55.00 ( 0.00%) 62.00 ( 7.00%) Pass 2 54.00 ( 0.00%) 62.00 ( 8.00%) while Rested 75.00 ( 0.00%) 80.00 ( 5.00%) MMTests Statistics: duration 3.7 3.7 rc4-clean rc4-patch User 1207.59 1207.46 System 1300.55 1299.61 Elapsed 2273.72 2157.06 MMTests Statistics: vmstat 3.7 3.7 rc4-clean rc4-patch Page Ins 3581516 2374368 Page Outs 11148692 10410332 Swap Ins 80 47 Swap Outs 3641 476 Direct pages scanned 37978 33826 Kswapd pages scanned 1828245 1342869 Kswapd pages reclaimed 1710236 1304099 Direct pages reclaimed 32207 31005 Kswapd efficiency 93% 97% Kswapd velocity 804.077 622.546 Direct efficiency 84% 91% Direct velocity 16.703 15.682 Percentage direct scans 2% 2% Page writes by reclaim 79252 9704 Page writes file 75611 9228 Page writes anon 3641 476 Page reclaim immediate 16764 11014 Page rescued immediate 0 0 Slabs scanned 2171904 2152448 Direct inode steals 385 2261 Kswapd inode steals 659137 609670 Kswapd skipped wait 1 69 THP fault alloc 546 631 THP collapse alloc 361 339 THP splits 259 263 THP fault fallback 98 50 THP collapse fail 20 17 Compaction stalls 747 499 Compaction success 244 145 Compaction failures 503 354 Compaction pages moved 370888 474837 Compaction move failure 77378 65259 ===END stress-highalloc This patch: Introduce MIGRATEPAGE_SUCCESS as the default return code for address_space_operations.migratepage() method and documents the expected return code for the same method in failure cases. Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: use vm_unmapped_area() in hugetlbfsMichel Lespinasse
Update the hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLBAndi Kleen
There was some desire in large applications using MAP_HUGETLB or SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. When no page size is specified it uses the mount of the default page size. The change is not visible in /proc/mounts because internal mounts don't appear there. It also has very little overhead: the additional mounts just consume a super block, but not more memory when not used. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. Only architectures that define multiple hugetlb sizes actually need it (that is currently x86, tile, powerpc). However tile and powerpc have user configurable hugetlb sizes, so it's not easy to add defines. A program on those architectures would need to query sysfs and use the appropiate log2. [akpm@linux-foundation.org: cleanups] [rientjes@google.com: fix build] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr()Namjae Jeon
There is no reason to pass the nr_pages_dirtied argument, because nr_pages_dirtied value from the caller is unused in balance_dirty_pages_ratelimited_nr(). Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11Merge tag 'tty-3.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY/Serial merge from Greg Kroah-Hartman: "Here's the big tty/serial tree set of changes for 3.8-rc1. Contained in here is a bunch more reworks of the tty port layer from Jiri and bugfixes from Alan, along with a number of other tty and serial driver updates by the various driver authors. Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY layer, which is much appreciated by me. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up some trivial conflicts in the staging tree, due to the fwserial driver having come in both ways (but fixed up a bit in the serial tree), and the ioctl handling in the dgrp driver having been done slightly differently (staging tree got that one right, and removed both TIOCGSOFTCAR and TIOCSSOFTCAR). * tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits) staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer() staging/fwserial: Remove superfluous free staging/fwserial: Use WARN_ONCE when port table is corrupted staging/fwserial: Destruct embedded tty_port on teardown staging/fwserial: Fix build breakage when !CONFIG_BUG staging: fwserial: Add TTY-over-Firewire serial driver drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user() staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver serial: ifx6x60: Add modem power off function in the platform reboot process serial: mxs-auart: unmap the scatter list before we copy the data serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled serial: max310x: Setup missing "can_sleep" field for GPIO tty/serial: fix ifx6x60.c declaration warning serial: samsung: add devicetree properties for non-Exynos SoCs serial: samsung: fix potential soft lockup during uart write tty: vt: Remove redundant null check before kfree. tty/8250 Add check for pci_ioremap_bar failure tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards tty/8250 Add XR17D15x devices to the exar_handle_irq override ...
2012-12-11Merge tag 'driver-core-3.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg Kroah-Hartman: "Here's the large driver core updates for 3.8-rc1. The biggest thing here is the various __dev* marking removals. This is going to be a pain for the merge with different subsystem trees, I know, but all of the patches included here have been ACKed by their various subsystem maintainers, as they wanted them to go through here. If this is too much of a pain, I can pull all of them out of this tree and just send you one with the other fixes/updates and then, after 3.8-rc1 is out, do the rest of the removals to ensure we catch them all, it's up to you. The merges should all be trivial, and Stephen has been doing them all in linux-next for a few weeks now quite easily. Other than the __dev* marking removals, there's nothing major here, some firmware loading updates and other minor things in the driver core. All of these have (much to Stephen's annoyance), been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up trivial conflicts in drivers/gpio/gpio-{em,stmpe}.c due to gpio update. * tag 'driver-core-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (93 commits) modpost.c: Stop checking __dev* section mismatches init.h: Remove __dev* sections from the kernel acpi: remove use of __devinit PCI: Remove __dev* markings PCI: Always build setup-bus when PCI is enabled PCI: Move pci_uevent into pci-driver.c PCI: Remove CONFIG_HOTPLUG ifdefs unicore32/PCI: Remove CONFIG_HOTPLUG ifdefs sh/PCI: Remove CONFIG_HOTPLUG ifdefs powerpc/PCI: Remove CONFIG_HOTPLUG ifdefs mips/PCI: Remove CONFIG_HOTPLUG ifdefs microblaze/PCI: Remove CONFIG_HOTPLUG ifdefs dma: remove use of __devinit dma: remove use of __devexit_p firewire: remove use of __devinitdata firewire: remove use of __devinit leds: remove use of __devexit leds: remove use of __devinit leds: remove use of __devexit_p mmc: remove use of __devexit ...
2012-12-11Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"Ingo Molnar
This reverts commit 5258f386ea4e8454bc801fb443e8a4217da1947c, because the underlying autogroups bug got fixed upstream in a better way, via: fd8ef11730f1 Revert "sched, autogroup: Stop going ahead if autogroup is disabled" Cc: Mike Galbraith <efault@gmx.de> Cc: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08vfs: fix O_DIRECT read past end of block deviceLinus Torvalds
The direct-IO write path already had the i_size checks in mm/filemap.c, but it turns out the read path did not, and removing the block size checks in fs/block_dev.c (commit bbec0270bdd8: "blkdev_max_block: make private to fs/buffer.c") removed the magic "shrink IO to past the end of the device" code there. Fix it by truncating the IO to the size of the block device, like the write path already does. NOTE! I suspect the write path would be *much* better off doing it this way in fs/block_dev.c, rather than hidden deep in mm/filemap.c. The mm/filemap.c code is extremely hard to follow, and has various conditionals on the target being a block device (ie the flag passed in to 'generic_write_checks()', along with a conditional update of the inode timestamp etc). It is also quite possible that we should treat this whole block device size as a "s_maxbytes" issue, and try to make the logic even more generic. However, in the meantime this is the fairly minimal targeted fix. Noted by Milan Broz thanks to a regression test for the cryptsetup reencrypt tool. Reported-and-tested-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-08Merge tag 'cputime-adjustment-cleanups' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull cputime cleanups from Frederic Weisbecker: * Improve naming and code location * Consolidate adjustment code * Comment the adjustement code Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-05vfs: clear to the end of the buffer on partial buffer readsDan Carpenter
READ is zero so the "rw & READ" test is always false. The intended test was "((rw & RW_MASK) == READ)". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-04vfs: avoid "attempt to access beyond end of device" warningsLinus Torvalds
The block device access simplification that avoided accessing the (racy) block size information (commit bbec0270bdd8: "blkdev_max_block: make private to fs/buffer.c") no longer checks the maximum block size in the block mapping path. That was _almost_ as simple as just removing the code entirely, because the readers and writers all check the size of the device anyway, so under normal circumstances it "just worked". However, the block size may be such that the end of the device may straddle one single buffer_head. At which point we may still want to access the end of the device, but the buffer we use to access it partially extends past the end. The 'bd_set_size()' function intentionally sets the block size to avoid this, but mounting the device - or setting the block size by hand to some other value - can modify that block size. So instead, teach 'submit_bh()' about the special case of the buffer head straddling the end of the device, and turning such an access into a smaller IO access, avoiding the problem. This, btw, also means that unlike before, we can now access the whole device regardless of device block size setting. So now, even if the device size is only 512-byte aligned, we can read and write even the last sector even when having a much bigger block size for accessing the rest of the device. So with this, we could now get rid of the 'bd_set_size()' block size code entirely - resulting in faster IO for the common case - but that would be a separate patch. Reported-and-tested-by: Romain Francoise <romain@orebokech.com> Reporeted-and-tested-by: Meelis Roos <mroos@linux.ee> Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-03Merge branch 'block-dev'Linus Torvalds
Merge 'block-dev' branch. I was going to just mark everything here for stable and leave it to the 3.8 merge window, but having decided on doing another -rc, I migth as well merge it now. This removes the bd_block_size_semaphore semaphore that was added in this release to fix a race condition between block size changes and block IO, and replaces it with atomicity guaratees in fs/buffer.c instead, along with simplifying fs/block-dev.c. This removes more lines than it adds, makes the code generally simpler, and avoids the latency/rt issues that the block size semaphore introduced for mount. I'm not happy with the timing, but it wouldn't be much better doing this during the merge window and then having some delayed back-port of it into stable. * block-dev: blkdev_max_block: make private to fs/buffer.c direct-io: don't read inode->i_blkbits multiple times blockdev: remove bd_block_size_semaphore again fs/buffer.c: make block-size be per-page and protected by the page lock
2012-12-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A bunch of fixes; the last one is this cycle regression, the rest are -stable fodder." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix off-by-one in argument passed by iterate_fd() to callbacks lookup_one_len: don't accept . and .. cifs: get rid of blind d_drop() in readdir nfs_lookup_revalidate(): fix a leak don't do blind d_drop() in nfs_prime_dcache()
2012-11-30Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Two low risk, small fixes, that fix cifs regressions introduced in 3.7." * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix wrong buffer pointer usage in smb_set_file_info cifs: fix writeback race with file that is growing
2012-11-29fix off-by-one in argument passed by iterate_fd() to callbacksAl Viro
Noticed by Pavel Roskin; the thing in his patch I disagree with was compensating for that shite in callbacks instead of fixing it once in the iterator itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29lookup_one_len: don't accept . and ..Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29cifs: get rid of blind d_drop() in readdirAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29nfs_lookup_revalidate(): fix a leakAl Viro
We are leaking fattr and fhandle if we decide that dentry is not to be invalidated, after all (e.g. happens to be a mountpoint). Just free both before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29don't do blind d_drop() in nfs_prime_dcache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29blkdev_max_block: make private to fs/buffer.cLinus Torvalds
We really don't want to look at the block size for the raw block device accesses in fs/block-dev.c, because it may be changing from under us. So get rid of the max_block logic entirely, since the caller should already have done it anyway. That leaves the only user of this function in fs/buffer.c, so move the whole function there and make it static. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29direct-io: don't read inode->i_blkbits multiple timesLinus Torvalds
Since directio can work on a raw block device, and the block size of the device can change under it, we need to do the same thing that fs/buffer.c now does: read the block size a single time, using ACCESS_ONCE(). Reading it multiple times can get different results, which will then confuse the code because it actually encodes the i_blksize in relationship to the underlying logical blocksize. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29blockdev: remove bd_block_size_semaphore againLinus Torvalds
This reverts the block-device direct access code to the previous unlocked code, now that fs/buffer.c no longer needs external locking. With this, fs/block_dev.c is back to the original version, apart from a whitespace cleanup that I didn't want to revert. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29fs/buffer.c: make block-size be per-page and protected by the page lockLinus Torvalds
This makes the buffer size handling be a per-page thing, which allows us to not have to worry about locking too much when changing the buffer size. If a page doesn't have buffers, we still need to read the block size from the inode, but we can do that with ACCESS_ONCE(), so that even if the size is changing, we get a consistent value. This doesn't convert all functions - many of the buffer functions are used purely by filesystems, which in turn results in the buffer size being fixed at mount-time. So they don't have the same consistency issues that the raw device access can have. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-28cputime: Rename thread_group_times to thread_group_cputime_adjustedFrederic Weisbecker
We have thread_group_cputime() and thread_group_times(). The naming doesn't provide enough information about the difference between these two APIs. To lower the confusion, rename thread_group_times() to thread_group_cputime_adjusted(). This name better suggests that it's a version of thread_group_cputime() that does some stabilization on the raw cputime values. ie here: scale on top of CFS runtime stats and bound lower value for monotonicity. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28CIFS: Fix wrong buffer pointer usage in smb_set_file_infoPavel Shilovsky
Commit 6bdf6dbd662176c0da5c3ac8ed10ac94e7776c85 caused a regression in setattr codepath that leads to files with wrong attributes. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-27cifs: fix writeback race with file that is growingJeff Layton
Commit eddb079deb4 created a regression in the writepages codepath. Previously, whenever it needed to check the size of the file, it did so by consulting the inode->i_size field directly. With that patch, the i_size was fetched once on entry into the writepages code and that value was used henceforth. If the file is changing size though (for instance, if someone is writing to it or has truncated it), then that value is likely to be wrong. This can lead to data corruption. Pages past the EOF at the time that the writepages call was issued may be silently dropped and ignored because cifs_writepages wrongly assumes that the file must have been truncated in the interim. Fix cifs_writepages to properly fetch the size from the inode->i_size field instead to properly account for this possibility. Original bug report is here: https://bugzilla.kernel.org/show_bug.cgi?id=50991 Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-26Merge branch 'akpm' (Fixes from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches) futex: avoid wake_futex() for a PI futex_q watchdog: using u64 in get_sample_period() writeback: put unused inodes to LRU after writeback completion mm: vmscan: check for fatal signals iff the process was throttled Revert "mm: remove __GFP_NO_KSWAPD" proc: check vma->vm_file before dereferencing UAPI: strip the _UAPI prefix from header guards during header installation include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
2012-11-26Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 regression fix from Jan Kara: "Fix an ext3 regression introduced during 3.7 merge window. It leads to deadlock if you stress the filesystem in the right way (luckily only if blocksize < pagesize)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix lock ordering bug in journal_unmap_buffer()
2012-11-26writeback: put unused inodes to LRU after writeback completionJan Kara
Commit 169ebd90131b ("writeback: Avoid iput() from flusher thread") removed iget-iput pair from inode writeback. As a side effect, inodes that are dirty during iput_final() call won't be ever added to inode LRU (iput_final() doesn't add dirty inodes to LRU and later when the inode is cleaned there's noone to add the inode there). Thus inodes are effectively unreclaimable until someone looks them up again. The practical effect of this bug is limited by the fact that inodes are pinned by a dentry for long enough that the inode gets cleaned. But still the bug can have nasty consequences leading up to OOM conditions under certain circumstances. Following can easily reproduce the problem: for (( i = 0; i < 1000; i++ )); do mkdir $i for (( j = 0; j < 1000; j++ )); do touch $i/$j echo 2 > /proc/sys/vm/drop_caches done done then one needs to run 'sync; ls -lR' to make inodes reclaimable again. We fix the issue by inserting unused clean inodes into the LRU after writeback finishes in inode_sync_complete(). Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26proc: check vma->vm_file before dereferencingStanislav Kinsbursky
Commit 7b540d0646ce ("proc_map_files_readdir(): don't bother with grabbing files") switched proc_map_files_readdir() to use @f_mode directly instead of grabbing @file reference, but same time the test for @vm_file presence was lost leading to nil dereference. The patch brings the test back. The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped (which is set to 'n' by default) so the bug doesn't affect regular kernels. The regression is 3.7-rc1 only as far as I can tell. [gorcunov@openvz.org: provided changelog] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26sysfs: Mark sysfs_attr_ns staticJosh Triplett
Nothing outside of fs/sysfs/file.c references this function, so mark it static. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26efi_pstore: Add a sequence counter to a variable nameSeiji Aguchi
[Issue] Currently, a variable name, which identifies each entry, consists of type, id and ctime. But if multiple events happens in a short time, a second/third event may fail to log because efi_pstore can't distinguish each event with current variable name. [Solution] A reasonable way to identify all events precisely is introducing a sequence counter to the variable name. The sequence counter has already supported in a pstore layer with "oopscount". So, this patch adds it to a variable name. Also, it is passed to read/erase callbacks of platform drivers in accordance with the modification of the variable name. <before applying this patch> a variable name of first event: dump-type0-1-12345678 a variable name of second event: dump-type0-1-12345678 type:0 id:1 ctime:12345678 If multiple events happen in a short time, efi_pstore can't distinguish them because variable names are same among them. <after applying this patch> it can be distinguishable by adding a sequence counter as follows. a variable name of first event: dump-type0-1-1-12345678 a variable name of Second event: dump-type0-1-2-12345678 type:0 id:1 sequence counter: 1(first event), 2(second event) ctime:12345678 In case of a write callback executed in pstore_console_write(), "0" is added to an argument of the write callback because it just logs all kernel messages and doesn't need to care about multiple events. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Mike Waychison <mikew@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-11-26efi_pstore: Add ctime to argument of erase callbackSeiji Aguchi
[Issue] Currently, a variable name, which is used to identify each log entry, consists of type, id and ctime. But an erase callback does not use ctime. If efi_pstore supported just one log, type and id were enough. However, in case of supporting multiple logs, it doesn't work because it can't distinguish each entry without ctime at erasing time. <Example> As you can see below, efi_pstore can't differentiate first event from second one without ctime. a variable name of first event: dump-type0-1-12345678 a variable name of second event: dump-type0-1-23456789 type:0 id:1 ctime:12345678, 23456789 [Solution] This patch adds ctime to an argument of an erase callback. It works across reboots because ctime of pstore means the date that the record was originally stored. To do this, efi_pstore saves the ctime to variable name at writing time and passes it to pstore at reading time. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Mike Waychison <mikew@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-11-23Merge tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6Linus Torvalds
Pull MTD fixes from David Woodhouse: "The most important part of this is that it fixes a regression in Samsung NAND chip detection, introduced by some rework which went into 3.7. The initial fix wasn't quite complete, so it's in two parts. In fact the first part is committed twice (Artem committed his own copy of the same patch) and I've merged Artem's tree into mine which already had that fix. I'd have recommitted that to make it somewhat cleaner, but figured by this point in the release cycle it was better to merge *exactly* the commits which have been in linux-next. If I'd recommitted, I'd also omit the sparse warning fix. But it's there, and it's harmless — just marking one function as 'static' in onenand code. This also includes a couple more fixes for stable: an AB-BA deadlock in JFFS2, and an invalid range check in slram." * tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6: mtd: nand: fix Samsung SLC detection regression mtd: nand: fix Samsung SLC NAND identification regression jffs2: Fix lock acquisition order bug in jffs2_write_begin mtd: onenand: Make flexonenand_set_boundary static mtd: slram: invalid checking of absolute end address mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions() mtd: nand: fix Samsung SLC NAND identification regression
2012-11-23jbd: Fix lock ordering bug in journal_unmap_buffer()Jan Kara
Commit 09e05d48 introduced a wait for transaction commit into journal_unmap_buffer() in the case we are truncating a buffer undergoing commit in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly we forgot to drop buffer lock before waiting for transaction commit and thus deadlock is possible when kjournald wants to lock the buffer. Fix the problem by dropping the buffer lock before waiting for transaction commit. Since we are still holding page lock (and that is OK), buffer cannot disappear under us. CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-20Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs and ext3 fixes from Jan Kara: "Fixes of reiserfs deadlocks when quotas are enabled (locking there was completely busted by BKL conversion) and also one small ext3 fix in the trim interface." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext3: Avoid underflow of in ext3_trim_fs() reiserfs: Move quota calls out of write lock reiserfs: Protect reiserfs_quota_write() with write lock reiserfs: Protect reiserfs_quota_on() with write lock reiserfs: Fix lock ordering during remount
2012-11-19ext3: Avoid underflow of in ext3_trim_fs()Lukas Czerner
Currently if len argument in ext3_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19reiserfs: Move quota calls out of write lockJan Kara
Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19reiserfs: Protect reiserfs_quota_write() with write lockJan Kara
Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19reiserfs: Protect reiserfs_quota_on() with write lockJan Kara
In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19reiserfs: Fix lock ordering during remountJan Kara
When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-18fanotify: fix FAN_Q_OVERFLOW case of fanotify_read()Al Viro
If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event metadata will not contain a valid file descriptor, but copy_event_to_user() didn't check for that, and unconditionally does a fd_install() on the file descriptor. Which in turn will cause a BUG_ON() in __fd_install(). Introduced by commit 352e3b249284 ("fanotify: sanitize failure exits in copy_event_to_user()") Mea culpa - missed that path ;-/ Reported-by: Alex Shi <lkml.alex@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc VFS fixes from Al Viro: "Remove a bogus BUG_ON() that can trigger spuriously + alpha bits of do_mount() constification I'd missed during the merge window." This pull request came in a week ago, I missed it for some reason. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill bogus BUG_ON() in do_close_on_exec() missing const in alpha callers of do_mount()
2012-11-18Merge tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs bugfixes from Ben Myers: - fix attr tree double split corruption - fix broken error handling in xfs_vm_writepage - drop buffer io reference when a bad bio is built * tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs: xfs: drop buffer io reference when a bad bio is built xfs: fix broken error handling in xfs_vm_writepage xfs: fix attr tree double split corruption
2012-11-18Merge branch 'sched/urgent' into sched/coreIngo Molnar
Merge in fixes before we queue up dependent bits, to avoid conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>