summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2011-05-25proc: allocate storage for numa_maps statistics onceStephen Wilson
In show_numa_map() we collect statistics into a numa_maps structure. Since the number of NUMA nodes can be very large, this structure is not a candidate for stack allocation. Instead of going thru a kmalloc()+kfree() cycle each time show_numa_map() is invoked, perform the allocation just once when /proc/pid/numa_maps is opened. Performing the allocation when numa_maps is opened, and thus before a reference to the target tasks mm is taken, eliminates a potential stalemate condition in the oom-killer as originally described by Hugh Dickins: ... imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25proc: make struct proc_maps_private truly privateStephen Wilson
Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps there is no need to export struct proc_maps_private to the world. Move it to fs/proc/internal.h instead. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: proc: move show_numa_map() to fs/proc/task_mmu.cStephen Wilson
Moving show_numa_map() from mempolicy.c to task_mmu.c solves several issues. - Having the show() operation "miles away" from the corresponding seq_file iteration operations is a maintenance burden. - The need to export ad hoc info like struct proc_maps_private is eliminated. - The implementation of show_numa_map() can be improved in a simple manner by cooperating with the other seq_file operations (start, stop, etc) -- something that would be messy to do without this change. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25tmpfs: implement generic xattr supportEric Paris
Implement generic xattrs for tmpfs filesystems. The Feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list */ char *name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks] Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25vmscan: change shrinker API by passing shrink_control structYing Han
Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25vmscan: change shrink_slab() interfaces by passing shrink_controlYing Han
Consolidate the existing parameters to shrink_slab() into a new shrink_control struct. This is needed later to pass the same struct to shrinkers. Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: Convert i_mmap_lock to a mutexPeter Zijlstra
Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: Remove i_mmap_lock lockbreakPeter Zijlstra
Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: 2aa15890f3c ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: mmu_gather reworkPeter Zijlstra
Rework the existing mmu_gather infrastructure. The direct purpose of these patches was to allow preemptible mmu_gather, but even without that I think these patches provide an improvement to the status quo. The first 9 patches rework the mmu_gather infrastructure. For review purpose I've split them into generic and per-arch patches with the last of those a generic cleanup. The next patch provides generic RCU page-table freeing, and the followup is a patch converting s390 to use this. I've also got 4 patches from DaveM lined up (not included in this series) that uses this to implement gup_fast() for sparc64. Then there is one patch that extends the generic mmu_gather batching. After that follow the mm preemptibility patches, these make part of the mm a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather rework makes mmu_gather preemptible as well. Making i_mmap_lock a mutex also enables a clean-up of the truncate code. This also allows for preemptible mmu_notifiers, something that XPMEM I think wants. Furthermore, it removes the new and universially detested unmap_mutex. This patch: Remove the first obstacle towards a fully preemptible mmu_gather. The current scheme assumes mmu_gather is always done with preemption disabled and uses per-cpu storage for the page batches. Change this to try and allocate a page for batching and in case of failure, use a small on-stack array to make some progress. Preemptible mmu_gather is desired in general and usable once i_mmap_lock becomes a mutex. Doing it before the mutex conversion saves us from having to rework the code by moving the mmu_gather bits inside the pte_lock. Also avoid flushing the tlb batches from under the pte lock, this is useful even without the i_mmap_lock conversion as it significantly reduces pte lock hold times. [akpm@linux-foundation.org: fix comment tpyo] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: make expand_downwards() symmetrical with expand_upwards()Michal Hocko
Currently we have expand_upwards exported while expand_downwards is accessible only via expand_stack or expand_stack_downwards. check_stack_guard_page is a nice example of the asymmetry. It uses expand_stack for VM_GROWSDOWN while expand_upwards is called for VM_GROWSUP case. Let's clean this up by exporting both functions and make those names consistent. Let's use expand_{upwards,downwards} because expanding doesn't always involve stack manipulation (an example is ia64_do_page_fault which uses expand_upwards for registers backing store expansion). expand_downwards has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards version in the early process initialization phase for growsup configuration. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-24Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: Fix comment to match the code in journal_start() jbd/jbd2: remove obsolete summarise_journal_usage. jbd: Fix forever sleeping process in do_get_write_access() ext2: fix error msg when mounting fs with too-large blocksize jbd: fix fsync() tid wraparound bug ext3: Fix fs corruption when make_indexed_dir() fails ext3: Fix lock inversion in ext3_symlink()
2011-05-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: make plock operation killable dlm: remove shared message stub for recovery dlm: delayed reply message warning dlm: Remove superfluous call to recalc_sigpending()
2011-05-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (43 commits) TOMOYO: Fix wrong domainname validation. SELINUX: add /sys/fs/selinux mount point to put selinuxfs CRED: Fix load_flat_shared_library() to initialise bprm correctly SELinux: introduce path_has_perm flex_array: allow 0 length elements flex_arrays: allow zero length flex arrays flex_array: flex_array_prealloc takes a number of elements, not an end SELinux: pass last path component in may_create SELinux: put name based create rules in a hashtable SELinux: generic hashtab entry counter SELinux: calculate and print hashtab stats with a generic function SELinux: skip filename trans rules if ttype does not match parent dir SELinux: rename filename_compute_type argument to *type instead of *con SELinux: fix comment to state filename_compute_type takes an objname not a qstr SMACK: smack_file_lock can use the struct path LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH LSM: split LSM_AUDIT_DATA_FS into _PATH and _INODE SELINUX: Make selinux cache VFS RCU walks safe SECURITY: Move exec_permission RCU checks into security modules SELinux: security_read_policy should take a size_t not ssize_t ...
2011-05-24Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (52 commits) UBIFS: switch to dynamic printks UBIFS: fix kernel-doc comments UBIFS: fix extremely rare mount failure UBIFS: simplify LEB recovery function further UBIFS: always cleanup the recovered LEB UBIFS: clean up LEB recovery function UBIFS: fix-up free space on mount if flag is set UBIFS: add the fixup function UBIFS: add a superblock flag for free space fix-up UBIFS: share the next_log_lnum helper UBIFS: expect corruption only in last journal head LEBs UBIFS: synchronize write-buffer before switching to the next bud UBIFS: remove BUG statement UBIFS: change bud replay function conventions UBIFS: substitute the replay tree with a replay list UBIFS: simplify replay UBIFS: store free and dirty space in the bud replay entry UBIFS: remove unnecessary stack variable UBIFS: double check that buds are replied in order UBIFS: make 2 functions static ...
2011-05-24Merge branch 'next' into for-linusJames Morris
2011-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Fix statfs->f_namelen fat: Replace all printk with fat_msg() fat: Add fat_msg() function for preformated FAT messages fat: Convert fat_fs_error to use %pV fat: Fix possible null deref in fat_cache_add() fat: use new setup() for ->dir_ops too
2011-05-24jbd: Fix comment to match the code in journal_start()Eryu Guan
journal_start returns an ERR_PTR() value rather than NULL on failure. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-23Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: obey minleft values during extent allocation correctly xfs: reset buffer pointers before freeing them xfs: avoid getting stuck during async inode flushes xfs: fix xfs_itruncate_start tracing xfs: fix duplicate workqueue initialisation xfs: kill off xfs_printk() xfs: fix race condition in AIL push trigger xfs: make AIL target updates and compares 32bit safe. xfs: always push the AIL to the target xfs: exit AIL push work correctly when AIL is empty xfs: ensure reclaim cursor is reset correctly at end of AG xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE xfs: fix compiler warning in xfs_trace.h xfs: cleanup duplicate initializations xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy xfs: exact busy extent tracking xfs: do not immediately reuse busy extent ranges xfs: optimize AGFL refills
2011-05-23Merge branch 'tty-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (48 commits) serial: 8250_pci: add support for Cronyx Omega PCI multiserial board. tty/serial: Fix break handling for PORT_TEGRA tty/serial: Add explicit PORT_TEGRA type n_tracerouter and n_tracesink ldisc additions. Intel PTI implementaiton of MIPI 1149.7. Kernel documentation for the PTI feature. export kernel call get_task_comm(). tty: Remove to support serial for S5P6442 pch_phub: Support new device ML7223 8250_pci: Add support for the Digi/IBM PCIe 2-port Adapter ASoC: Update cx20442 for TTY API change pch_uart: Support new device ML7223 IOH parport: Use request_muxed_region for IT87 probe and lock tty/serial: add support for Xilinx PS UART n_gsm: Use print_hex_dump_bytes drivers/tty/moxa.c: Put correct tty value TTY: tty_io, annotate locking functions TTY: serial_core, remove superfluous set_task_state TTY: serial_core, remove invalid test Char: moxa, fix locking in moxa_write ... Fix up trivial conflicts in drivers/bluetooth/hci_ldisc.c and drivers/tty/serial/Makefile. I did the hci_ldisc thing as an evil merge, cleaning things up.
2011-05-23Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: Reorder clock bases hrtimers: Avoid touching inactive timer bases hrtimers: Make struct hrtimer_cpu_base layout less stupid timerfd: Manage cancelable timers in timerfd clockevents: Move C3 stop test outside lock alarmtimer: Drop device refcount after rtc_open() alarmtimer: Check return value of class_find_device() timerfd: Allow timers to be cancelled when clock was set hrtimers: Prepare for cancel on clock was set timers
2011-05-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) b43: fix comment typo reqest -> request Haavard Skinnemoen has left Atmel cris: typo in mach-fs Makefile Kconfig: fix copy/paste-ism for dell-wmi-aio driver doc: timers-howto: fix a typo ("unsgined") perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course'). treewide: fix a few typos in comments regulator: change debug statement be consistent with the style of the rest Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations" audit: acquire creds selectively to reduce atomic op overhead rtlwifi: don't touch with treewide double semicolon removal treewide: cleanup continuations and remove logging message whitespace ath9k_hw: don't touch with treewide double semicolon removal include/linux/leds-regulator.h: fix syntax in example code tty: fix typo in descripton of tty_termios_encode_baud_rate xtensa: remove obsolete BKL kernel option from defconfig m68k: fix comment typo 'occcured' arch:Kconfig.locks Remove unused config option. treewide: remove extra semicolons ...
2011-05-23block: move bd_set_size() above rescan_partitions() in __blkdev_get()Tejun Heo
02e352287a4 (block: rescan partitions on invalidated devices on -ENOMEDIA too) relocated partition rescan above explicit bd_set_size() to simplify condition check. As rescan_partitions() does its own bdev size setting, this doesn't break anything; however, rescan_partitions() prints out the following messages when adjusting bdev size, which can be confusing. sda: detected capacity change from 0 to 146815737856 sdb: detected capacity change from 0 to 146815737856 This patch restores the original order and remove the warning messages. stable: Please apply together with 02e352287a4 (block: rescan partitions on invalidated devices on -ENOMEDIA too). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tony Luck <tony.luck@gmail.com> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-23dlm: make plock operation killableDavid Teigland
Allow processes blocked on plock requests to be interrupted when they are killed. This leaves the problem of cleaning up the lock state in userspace. This has three parts: 1. Add a flag to unlock operations sent to userspace indicating the file is being closed. Userspace will then look for and clear any waiting plock operations that were abandoned by an interrupted process. 2. Queue an unlock-close operation (like in 1) to clean up userspace from an interrupted plock request. This is needed because the vfs will not send a cleanup-unlock if it sees no locks on the file, which it won't if the interrupted operation was the only one. 3. Do not use replies from userspace for unlock-close operations because they are unnecessary (they are just cleaning up for the process which did not make an unlock call). This also simplifies the new unlock-close generated from point 2. Signed-off-by: David Teigland <teigland@redhat.com>
2011-05-23Merge branch 'exec_rm_compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc * 'exec_rm_compat' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: exec: document acct_arg_size() exec: unify do_execve/compat_do_execve code exec: introduce struct user_arg_ptr exec: introduce get_user_arg_ptr() helper
2011-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Wait properly when flushing the ail list GFS2: Wipe directory hash table metadata when deallocating a directory
2011-05-23timerfd: Manage cancelable timers in timerfdThomas Gleixner
Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the timer interrupt. Yes, I did not think about it, because the solution was so elegant. I didn't like the extra list in timerfd when it was proposed some time ago, but with a rcu based list the list walk it's less horrible than the original global lock, which was held over the list iteration. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: use mark_buffer_dirty to mark btnode or meta data dirty nilfs2: always set back pointer to host inode in mapping->host nilfs2: get rid of NILFS_I_NILFS nilfs2: use list_first_entry nilfs2: use empty_aops for gc-inodes nilfs2: implement resize ioctl nilfs2: add truncation routine of segment usage file nilfs2: add routine to move secondary super block nilfs2: add ioctl which limits range of segment to be allocated nilfs2: zero fill unused portion of super root block nilfs2: super root size should change depending on inode size nilfs2: get rid of private page allocator nilfs2: merge list_del()/list_add_tail() to list_move_tail()
2011-05-23UBIFS: switch to dynamic printksArtem Bityutskiy
Switch to debugging using dynamic printk (pr_debug()). There is no good reason to carry custom debugging prints if there is so cool and powerful generic dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With dynamic printks we can switch on/of individual prints, per-file, per-function and per format messages. This means that instead of doing old-fashioned echo 1 > /sys/module/ubifs/parameters/debug_msgs to enable general messages, we can do: echo 'format "UBIFS DBG gen" +ptlf' > control to enable general messages and additionally ask the dynamic printk infrastructure to print process ID, line number and function name. So there is no reason to keep UBIFS-specific crud if there is more powerful generic thing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-05-22fs: add missing prefetch.h includeHeiko Carstens
Fixes this build error on s390 and probably other archs as well: fs/inode.c: In function 'new_inode': fs/inode.c:894:2: error: implicit declaration of function 'spin_lock_prefetch' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> [ Happens on architectures that don't define their own prefetch functions in <asm/processor.h>, and instead rely on the default ones in <linux/prefetch.h> - Linus] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-21GFS2: Wait properly when flushing the ail listSteven Whitehouse
The ail flush code has always relied upon log flushing to prevent it from spinning needlessly. This fixes it to wait on the last I/O request submitted (we don't need to wait for all of it) instead of either spinning with io_schedule or sleeping. As a result cpu usage of gfs2_logd is much reduced with certain workloads. Reported-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-21GFS2: Wipe directory hash table metadata when deallocating a directorySteven Whitehouse
The deallocation code for directories in GFS2 is largely divided into two parts. The first part deallocates any directory leaf blocks and marks the directory as being a regular file when that is complete. The second stage was identical to deallocating regular files. Regular files have their data blocks in a different address space to directories, and thus what would have been normal data blocks in a regular file (the hash table in a GFS2 directory) were deallocated correctly. However, a reference to these blocks was left in the journal (assuming of course that some previous activity had resulted in those blocks being in the journal or ail list). This patch uses the i_depth as a test of whether the inode is an exhash directory (we cannot test the inode type as that has already been changed to a regular file at this stage in deallocation) The original issue was reported by Chris Hertel as an issue he encountered running bonnie++ Reported-by: Christopher R. Hertel <crh@samba.org> Cc: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-21VFS: move BUG_ON test for symlink nd->depth after current->link_count testErez Zadok
This solves a serious VFS-level bug in nested_symlink (which was rewritten from do_follow_link), and follows the order of depth tests that existed before. The bug triggers a BUG_ON in fs/namei.c:1381, when running racer with symlink and rename ops. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-20Fix for buffer overflow in ldm_frag_add not sufficientTimo Warns
As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer overflow in ldm_frag_add) is not sufficient. The original patch in commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted partition table") does not consider that, for subsequent fragments, previously allocated memory is used. [1] http://lkml.org/lkml/2011/5/6/407 Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-20Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] define "_sdata" symbol pstore: Fix Kconfig dependencies for apei->pstore pstore: fix potential logic issue in pstore read interface pstore: fix pstore filesystem mount/remount issue pstore: fix one type of return value in pstore [IA64] fix build warning in arch/ia64/oprofile/backtrace.c
2011-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (32 commits) [CIFS] Fix to problem with getattr caused by invalidate simplification patch [CIFS] Remove sparse warning [CIFS] Update cifs to version 1.72 cifs: Change key name to cifs.idmap, misc. clean-up cifs: Unconditionally copy mount options to superblock info cifs: Use kstrndup for cifs_sb->mountdata cifs: Simplify handling of submount options in cifs_mount. cifs: cifs_parse_mount_options: do not tokenize mount options in-place cifs: Add support for mounting Windows 2008 DFS shares cifs: Extract DFS referral expansion logic to separate function cifs: turn BCC into a static inlined function cifs: keep BCC in little-endian format cifs: fix some unused variable warnings in id_rb_search CIFS: Simplify invalidate part (try #5) CIFS: directio read/write cleanups consistently use smb_buf_length as be32 for cifs (try 3) cifs: Invoke id mapping functions (try #17 repost) cifs: Add idmap key and related data structures and functions (try #17 repost) CIFS: Add launder_page operation (try #3) Introduce smb2 mounts as vers=2 ...
2011-05-20Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/miscLinus Torvalds
* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits) signal: trivial, fix the "timespec declared inside parameter list" warning job control: reorganize wait_task_stopped() ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() signal: sys_sigprocmask() needs retarget_shared_pending() signal: cleanup sys_sigprocmask() signal: rename signandsets() to sigandnsets() signal: do_sigtimedwait() needs retarget_shared_pending() signal: introduce do_sigtimedwait() to factor out compat/native code signal: sys_rt_sigtimedwait: simplify the timeout logic signal: cleanup sys_rt_sigprocmask() x86: signal: sys_rt_sigreturn() should use set_current_blocked() x86: signal: handle_signal() should use set_current_blocked() signal: sigprocmask() should do retarget_shared_pending() signal: sigprocmask: narrow the scope of ->siglock signal: retarget_shared_pending: optimize while_each_thread() loop signal: retarget_shared_pending: consider shared/unblocked signals only signal: introduce retarget_shared_pending() ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/ signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending() ...
2011-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits) GFS2: Move all locking inside the inode creation function GFS2: Clean up symlink creation GFS2: Clean up mkdir GFS2: Use UUID field in generic superblock GFS2: Rename ops_inode.c to inode.c GFS2: Inode.c is empty now, remove it GFS2: Move final part of inode.c into super.c GFS2: Move most of the remaining inode.c into ops_inode.c GFS2: Move gfs2_refresh_inode() and friends into glops.c GFS2: Remove gfs2_dinode_print() function GFS2: When adding a new dir entry, inc link count if it is a subdir GFS2: Make gfs2_dir_del update link count when required GFS2: Don't use gfs2_change_nlink in link syscall GFS2: Don't use a try lock when promoting to a higher mode GFS2: Double check link count under glock GFS2: Improve bug trap code in ->releasepage() GFS2: Fix ail list traversal GFS2: make sure fallocate bytes is a multiple of blksize GFS2: Add an AIL writeback tracepoint GFS2: Make writeback more responsive to system conditions ...
2011-05-20sanitize <linux/prefetch.h> usageLinus Torvalds
Commit e66eed651fd1 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]') grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-20Merge branch 'timers/urgent' into timers/coreThomas Gleixner
Reason: Get upstream fixes and kfree_rcu which is necessary for a follow up patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-20Pull pstore into release branchTony Luck
2011-05-20[CIFS] Fix to problem with getattr caused by invalidate simplification patchSteve French
Fix to earlier "Simplify invalidate part (try #6)" patch That patch caused problems with connectathon test 5. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-20UBIFS: fix kernel-doc commentsArtem Bityutskiy
This is a minor fix for UBIFS kernel-doc comments - we forgot the "@" symbol for several 'struct ubifs_debug_info'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-05-19Merge branch 'driver-core-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (44 commits) debugfs: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning sysfs: remove "last sysfs file:" line from the oops messages drivers/base/memory.c: fix warning due to "memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION" memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION SYSFS: Fix erroneous comments for sysfs_update_group(). driver core: remove the driver-model structures from the documentation driver core: Add the device driver-model structures to kerneldoc Translated Documentation/email-clients.txt RAW driver: Remove call to kobject_put(). reboot: disable usermodehelper to prevent fs access efivars: prevent oops on unload when efi is not enabled Allow setting of number of raw devices as a module parameter Introduce CONFIG_GOOGLE_FIRMWARE driver: Google Memory Console driver: Google EFI SMI x86: Better comments for get_bios_ebda() x86: get_bios_ebda_length() misc: fix ti-st build issues params.c: Use new strtobool function to process boolean inputs debugfs: move to new strtobool ... Fix up trivial conflicts in fs/debugfs/file.c due to the same patch being applied twice, and an unrelated cleanup nearby.
2011-05-19xfs: obey minleft values during extent allocation correctlyDave Chinner
When allocating an extent that is long enough to consume the remaining free space in an AG, we need to ensure that the allocation leaves enough space in the AG for any subsequent bmap btree blocks that are needed to track the new extent. These have to be allocated in the same AG as we only reserve enough blocks in an allocation transaction for modification of the freespace trees in a single AG. xfs_alloc_fix_minleft() has been considering blocks on the AGFL as free blocks available for extent and bmbt block allocation, which is not correct - blocks on the AGFL are there exclusively for the use of the free space btrees. As a result, when minleft is less than the number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim the given extent to leave minleft blocks available for bmbt allocation, and hence we can fail allocation during bmbt record insertion. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: reset buffer pointers before freeing themDave Chinner
When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: avoid getting stuck during async inode flushesDave Chinner
When the underlying inode buffer is locked and xfs_sync_inode_attr() is doing a non-blocking flush, xfs_iflush() can return EAGAIN. When this happens, clear the error rather than returning it to xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk delaying for a short while and trying again. This can result in background walks getting stuck on the one AG until inode buffer is unlocked by some other means. This behaviour was noticed when analysing event traces followed by code inspection and verification of the fix via further traces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: fix xfs_itruncate_start tracingDave Chinner
Variables are ordered incorrectly in trace call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: fix duplicate workqueue initialisationDave Chinner
The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: kill off xfs_printk()Joe Perches
xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid using xfs_printk() altogether. This is the only remaining use of xfs_printk(), so changing it this way means xfs_printk() can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated. Also add format checking to the non-debug inline function xfs_debug. Miscellaneous function prototype argument alignment. (Updated to delete the definition of xfs_printk(), which is no longer used or needed.) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19[CIFS] Remove sparse warningSteve French
Move extern for cifsConvertToUCS to different header to prevent following warning: CHECK fs/cifs/cifs_unicode.c fs/cifs/cifs_unicode.c:267:1: warning: symbol 'cifsConvertToUCS' was not declared. Should it be static? Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>