summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2009-09-24procfs: disable per-task stack usage on NOMMUAndrew Morton
It needs walk_page_range(). Reported-by: Michal Simek <monstr@monstr.eu> Tested-by: Michal Simek <monstr@monstr.eu> Cc: Stefani Seibold <stefani@seibold.net> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Prevent lower dentry from going negative during unlink eCryptfs: Propagate vfs_read and vfs_write return codes eCryptfs: Validate global auth tok keys eCryptfs: Filename encryption only supports password auth tokens eCryptfs: Check for O_RDONLY lower inodes when opening lower files eCryptfs: Handle unrecognized tag 3 cipher codes ecryptfs: improved dependency checking and reporting eCryptfs: Fix lockdep-reported AB-BA mutex issue ecryptfs: Remove unneeded locking that triggers lockdep false positives
2009-09-24nfs[23] tcp breakage in mount with binary optionsAl Viro
We forget to set nfs_server.protocol in tcp case when old-style binary options are passed to mount. The thing remains zero and never validated afterwards. As the result, we hit BUG in fs/nfs/client.c:588. Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data merged yesterday... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-24Merge branch 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6: [PATCH] Fix idle time field in /proc/uptime
2009-09-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (42 commits) Btrfs: hash the btree inode during fill_super Btrfs: relocate file extents in clusters Btrfs: don't rename file into dummy directory Btrfs: check size of inode backref before adding hardlink Btrfs: fix releasepage to avoid unlocking extents we haven't locked Btrfs: Fix test_range_bit for whole file extents Btrfs: fix errors handling cached state in set/clear_extent_bit Btrfs: fix early enospc during balancing Btrfs: deal with NULL space info Btrfs: account for space used by the super mirrors Btrfs: fix extent entry threshold calculation Btrfs: remove dead code Btrfs: fix bitmap size tracking Btrfs: don't keep retrying a block group if we fail to allocate a cluster Btrfs: make balance code choose more wisely when relocating Btrfs: fix arithmetic error in clone ioctl Btrfs: add snapshot/subvolume destroy ioctl Btrfs: change how subvolumes are organized Btrfs: do not reuse objectid of deleted snapshot/subvol Btrfs: speed up snapshot dropping ...
2009-09-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: truncate: use new helpers truncate: new helpers fs: fix overflow in sys_mount() for in-kernel calls fs: Make unload_nls() NULL pointer safe freeze_bdev: grab active reference to frozen superblocks freeze_bdev: kill bd_mount_sem exofs: remove BKL from super operations fs/romfs: correct error-handling code vfs: seq_file: add helpers for data filling vfs: remove redundant position check in do_sendfile vfs: change sb->s_maxbytes to a loff_t vfs: explicitly cast s_maxbytes in fiemap_check_ranges libfs: return error code on failed attr set seq_file: return a negative error code when seq_path_root() fails. vfs: optimize touch_time() too vfs: optimization for touch_atime() vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it fs/inode.c: add dev-id and inode number for debugging in init_special_inode() libfs: make simple_read_from_buffer conventional
2009-09-24Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...
2009-09-24task_struct cleanup: move binfmt field to mm_structHiroshi Shimamoto
Because the binfmt is not different between threads in the same process, it can be moved from task_struct to mm_struct. And binfmt moudle is handled per mm_struct instead of task_struct. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fs/romfs: correct error-handling codeJulia Lawall
romfs_iget returns an ERR_PTR value in an error case instead of NULL. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @match exists@ expression x, E; statement S1, S2; @@ x = romfs_iget(...) ... when != x = E ( * if (x == NULL || ...) S1 else S2 | * if (x == NULL && ...) S1 else S2 ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24adfs: remove redundant test on unsignedRoel Kluin
unsigned block cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fs/char_dev.c: remove useless loopRenzo Davoli
There are two useless lines in fs/char_dev.c. In register_chrdev there is a loop to change all '/' into '!' in the kernel object name. This code is useless as the same substitution is in kobject_set_name_vargs in lib/kobject.c: 228 /* ewww... some of these buggers have '/' in the name ... */ 229 while ((s = strchr(kobj->name, '/'))) 230 s[0] = '!'; kobject_set_name_vargs is called by kobject_set_name. kobject_set_name is called just above the useless loop. [hidave.darkstar@gmail.com: fix warning, remove the unused char *s] Signed-off-by: Renzo Davoli <renzo@cs.unibo.it> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24flat: use IS_ERR_VALUE() helper macroMike Frysinger
There is a common macro now for testing mixed pointer/errno values, so use that rather than handling the casts ourself. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: David McCullough <david_mccullough@securecomputing.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fdpic: ignore the loader's PT_GNU_STACK when calculating the stack sizeDavid Howells
Ignore the loader's PT_GNU_STACK when calculating the stack size, and only consider the executable's PT_GNU_STACK, assuming the executable has one. Currently the behaviour is to take the largest stack size and use that, but that means you can't reduce the stack size in the executable. The loader's stack size should probably only be used when executing the loader directly. WARNING: This patch is slightly dangerous - it may render a system inoperable if the loader's stack size is larger than that of important executables, and the system relies unknowingly on this increasing the size of the stack. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24elf: clean up fill_note_info()Amerigo Wang
Introduce a helper function elf_note_info_init() to help fill_note_info() to do initializations, also fix the potential memory leaks. [akpm@linux-foundation.org: remove NUM_NOTES] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fcntl: add F_[SG]ETOWN_EXPeter Zijlstra
In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex as argument: struct f_owner_ex { int type; pid_t pid; }; Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Tested-by: stephane eranian <eranian@googlemail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Roland McGrath <roland@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission()Oleg Nesterov
group_send_sig_info()->check_kill_permission() assumes that current is the sender and uses current_cred(). This is not true in send_sigio_to_task() case. From the security pov the sender is not current, but the task which did fcntl(F_SETOWN), that is why we have sigio_perm() which uses the right creds to check. Fortunately, send_sigio() always sends either SEND_SIG_PRIV or SI_FROMKERNEL() signal, so check_kill_permission() does nothing. But still it would be tidier to avoid this bogus security check and save a couple of cycles. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24exec: fix set_binfmt() vs sys_delete_module() raceOleg Nesterov
sys_delete_module() can set MODULE_STATE_GOING after search_binary_handler() does try_module_get(). In this case set_binfmt()->try_module_get() fails but since none of the callers check the returned error, the task will run with the wrong old ->binfmt. The proper fix should change all ->load_binary() methods, but we can rely on fact that the caller must hold a reference to binfmt->module and use __module_get() which never fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24exec: allow do_coredump() to wait for user space pipe readers to completeNeil Horman
Allow core_pattern pipes to wait for user space to complete One of the things that user space processes like to do is look at metadata for a crashing process in their /proc/<pid> directory. this is racy however, since do_coredump in the kernel doesn't wait for the user space process to complete before it reaps the crashing process. This patch corrects that. Allowing the kernel to wait for the user space process to complete before cleaning up the crashing process. This is a bit tricky to do for a few reasons: 1) The user space process isn't our child, so we can't sys_wait4 on it 2) We need to close the pipe before waiting for the user process to complete, since the user process may rely on an EOF condition I've discussed several solutions with Oleg Nesterov off-list about this, and this is the one we've come up with. We add ourselves as a pipe reader (to prevent premature cleanup of the pipe_inode_info), and remove ourselves as a writer (to provide an EOF condition to the writer in user space), then we iterate until the user space process exits (which we detect by pipe->readers == 1, hence the > 1 check in the loop). When we exit the loop, we restore the proper reader/writer values, then we return and let filp_close in do_coredump clean up the pipe data properly. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Earl Chew <earl_chew@agilent.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24exec: let do_coredump() limit the number of concurrent dumps to pipesNeil Horman
Introduce core pipe limiting sysctl. Since we can dump cores to pipe, rather than directly to the filesystem, we create a condition in which a user can create a very high load on the system simply by running bad applications. If the pipe reader specified in core_pattern is poorly written, we can have lots of ourstandig resources and processes in the system. This sysctl introduces an ability to limit that resource consumption. core_pipe_limit defines how many in-flight dumps may be run in parallel, dumps beyond this value are skipped and a note is made in the kernel log. A special value of 0 in core_pipe_limit denotes unlimited core dumps may be handled (this is the default value). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Earl Chew <earl_chew@agilent.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24exec: make do_coredump() more resilient to recursive crashesNeil Horman
Change how we detect recursive dumps. Currently we have a mechanism by which we try to compare pathnames of the crashing process to the core_pattern path. This is broken for a dozen reasons, and just doesn't work in any sort of robust way. I'm replacing it with the use of a 0 RLIMIT_CORE value. Since helper apps set RLIMIT_CORE to zero, we don't write out core files for any process with that particular limit set. It the core_pattern is a pipe, any non-zero limit is translated to RLIM_INFINITY. This allows complete dumps to be captured, but prevents infinite recursion in the event that the core_pattern process itself crashes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Earl Chew <earl_chew@agilent.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24hugetlbfs: do not call user_shm_lock() for MAP_HUGETLB fixFrom: Mel Gorman
Commit 6bfde05bf5c ("hugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal mount") altered can_do_hugetlb_shm() to check if a file is being created for shared memory or mmap(). If this returns false, we then unconditionally call user_shm_lock() triggering a warning. This block should never be entered for MAP_HUGETLB. This patch partially reverts the problem and fixes the check. Signed-off-by: Eric B Munson <ebmunson@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24Merge branch 'master' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus Conflicts: fs/btrfs/super.c
2009-09-24Btrfs: hash the btree inode during fill_superYan Zheng
The snapshot deletion patches dropped this line, but the inode needs to be hashed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-24Btrfs: relocate file extents in clustersYan, Zheng
The extent relocation code copy file extents one by one when relocating data block group. This is inefficient if file extents are small. This patch makes the relocation code copy file extents in clusters. So we can can make better use of read-ahead. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-24Btrfs: don't rename file into dummy directoryYan, Zheng
A recent change enforces only one access point to each subvolume. The first directory entry (the one added when the subvolume/snapshot was created) is treated as valid access point, all other subvolume links are linked to dummy empty directories. The dummy directories are temporary inodes that only in memory, so we can not rename file into them. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-24Btrfs: check size of inode backref before adding hardlinkYan, Zheng
For every hardlink in btrfs, there is a corresponding inode back reference. All inode back references for hardlinks in a given directory are stored in single b-tree item. The size of b-tree item is limited by the size of b-tree leaf, so we can only create limited number of hardlinks to a given file in a directory. The original code lacks of the check, it oops if the number of hardlinks goes over the limit. This patch fixes the issue by adding check to btrfs_link and btrfs_rename. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-24truncate: use new helpersnpiggin@suse.de
Update some fs code to make use of new helper functions introduced in the previous patch. Should be no significant change in behaviour (except CIFS now calls send_sig under i_lock, via inode_newsize_ok). Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-nfs@vger.kernel.org Cc: Trond.Myklebust@netapp.com Cc: linux-cifs-client@lists.samba.org Cc: sfrench@samba.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24truncate: new helpersnpiggin@suse.de
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok. vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and into mm/truncate.c. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs: fix overflow in sys_mount() for in-kernel callsVegard Nossum
sys_mount() reads/copies a whole page for its "type" parameter. When do_mount_root() passes a kernel address that points to an object which is smaller than a whole page, copy_mount_options() will happily go past this memory object, possibly dereferencing "wild" pointers that could be in any state (hence the kmemcheck warning, which shows that parts of the next page are not even allocated). (The likelihood of something going wrong here is pretty low -- first of all this only applies to kernel calls to sys_mount(), which are mostly found in the boot code. Secondly, I guess if the page was not mapped, exact_copy_from_user() _would_ in fact handle it correctly because of its access_ok(), etc. checks.) But it is much nicer to avoid the dubious reads altogether, by stopping as soon as we find a NUL byte. Is there a good reason why we can't do something like this, using the already existing strndup_from_user()? [akpm@linux-foundation.org: make copy_mount_string() static] [AV: fix compat mount breakage, which involves undoing akpm's change above] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: al <al@dizzy.pdmi.ras.ru>
2009-09-24fs: Make unload_nls() NULL pointer safeThomas Gleixner
Most call sites of unload_nls() do: if (nls) unload_nls(nls); Check the pointer inside unload_nls() like we do in kfree() and simplify the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steve French <sfrench@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: grab active reference to frozen superblocksChristoph Hellwig
Currently we held s_umount while a filesystem is frozen, despite that we might return to userspace and unlock it from a different process. Instead grab an active reference to keep the file system busy and add an explicit check for frozen filesystems in remount and reject the remount instead of blocking on s_umount. Add a new get_active_super helper to super.c for use by freeze_bdev that grabs an active reference to a superblock from a given block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: kill bd_mount_semChristoph Hellwig
Now that we have the freeze count there is not much reason for bd_mount_sem anymore. The actual freeze/thaw operations are serialized using the bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is get_sb_bdev which tries to prevent mounting a filesystem while the block device is frozen. Instead of add a check for bd_fsfreeze_count and return -EBUSY if a filesystem is frozen. While that is a change in user visible behaviour a failing mount is much better for this case rather than having the mount process stuck uninterruptible for a long time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24exofs: remove BKL from super operationsBoaz Harrosh
the two places inside exofs that where taking the BKL were: exofs_put_super() - .put_super and exofs_sync_fs() - which is .sync_fs and is also called from .write_super. Now exofs_sync_fs() is protected from itself by also taking the sb_lock. exofs_put_super() directly calls exofs_sync_fs() so there is no danger between these two either. In anyway there is absolutely nothing dangerous been done inside exofs_sync_fs(). Unless there is some subtle race with the actual lifetime of the super_block in regard to .put_super and some other parts of the VFS. Which is highly unlikely. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs/romfs: correct error-handling codeJulia Lawall
romfs_fill_super() assumes that romfs_iget() returns NULL when it fails. romfs_iget() actually returns ERR_PTR(-ve) in that case... Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: seq_file: add helpers for data fillingMiklos Szeredi
Add two helpers that allow access to the seq_file's own buffer, but hide the internal details of seq_files. This allows easier implementation of special purpose filling functions. It also cleans up some existing functions which duplicated the seq_file logic. Make these inline functions in seq_file.h, as suggested by Al. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: remove redundant position check in do_sendfileJeff Layton
As Johannes Weiner pointed out, one of the range checks in do_sendfile is redundant and is already checked in rw_verify_area. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: change sb->s_maxbytes to a loff_tJeff Layton
sb->s_maxbytes is supposed to indicate the maximum size of a file that can exist on the filesystem. It's declared as an unsigned long long. Even if a filesystem has no inherent limit that prevents it from using every bit in that unsigned long long, it's still problematic to set it to anything larger than MAX_LFS_FILESIZE. There are places in the kernel that cast s_maxbytes to a signed value. If it's set too large then this cast makes it a negative number and generally breaks the comparison. Change s_maxbytes to be loff_t instead. That should help eliminate the temptation to set it too large by making it a signed value. Also, add a warning for couple of releases to help catch filesystems that set s_maxbytes too large. Eventually we can either convert this to a BUG() or just remove it and in the hope that no one will get it wrong now that it's a signed value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: explicitly cast s_maxbytes in fiemap_check_rangesJeff Layton
If fiemap_check_ranges is passed a large enough value, then it's possible that the value would be cast to a signed value for comparison against s_maxbytes when we change it to loff_t. Make sure that doesn't happen by explicitly casting s_maxbytes to an unsigned value for the purposes of comparison. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Love <rlove@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24libfs: return error code on failed attr setWu Fengguang
Currently all simple_attr.set handlers return 0 on success and negative codes on error. Fix simple_attr_write() to return these error codes. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24seq_file: return a negative error code when seq_path_root() fails.Tetsuo Handa
seq_path_root() is returning a return value of successful __d_path() instead of returning a negative value when mangle_path() failed. This is not a bug so far because nobody is using return value of seq_path_root(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: optimize touch_time() tooAndi Kleen
Do a similar optimization as earlier for touch_atime. Getting the lock in mnt_get_write is relatively costly, so try all avenues to avoid it first. This patch is careful to still only update inode fields inside the lock region. This didn't show up in benchmarks, but it's easy enough to do. [akpm@linux-foundation.org: fix typo in comment] [hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Valerie Aurora <vaurora@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: optimization for touch_atime()Andi Kleen
Some benchmark testing shows touch_atime to be high up in profile logs for IO intensive workloads. Most likely that's due to the lock in mnt_want_write(). Unfortunately touch_atime first takes the lock, and then does all the other tests that could avoid atime updates (like noatime or relatime). Do it the other way round -- first try to avoid the update and only then if that didn't succeed take the lock. That works because none of the atime avoidance tests rely on locking. This also eliminates a goto. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Reviewed-by: Valerie Aurora <vaurora@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: split generic_forget_inode() so that hugetlbfs does not have to copy itJan Kara
Hugetlbfs needs to do special things instead of truncate_inode_pages(). Currently, it copied generic_forget_inode() except for truncate_inode_pages() call which is asking for trouble (the code there isn't trivial). So create a separate function generic_detach_inode() which does all the list magic done in generic_forget_inode() and call it from hugetlbfs_forget_inode(). Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs/inode.c: add dev-id and inode number for debugging in init_special_inode()Manish Katiyar
Add device-id and inode number for better debugging. This was suggested by Andreas in one of the threads http://article.gmane.org/gmane.comp.file-systems.ext4/12062 . "If anyone has a chance, fixing this error message to be not-useless would be good... Including the device name and the inode number would help track down the source of the problem." Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24libfs: make simple_read_from_buffer conventionalSteven Rostedt
Impact: have simple_read_from_buffer conform to standards It was brought to my attention by Andrew Morton, Theodore Tso, and H. Peter Anvin that a read from userspace should only return -EFAULT if nothing was actually read. Looking at the simple_read_from_buffer I noticed that this function does not conform to that rule. This patch fixes that function. [akpm@linux-foundation.org: simplification suggested by hpa] [hpa@zytor.com: fix count==0 handling] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24[PATCH] Fix idle time field in /proc/uptimeMichael Abbott
Git commit 79741dd changes idle cputime accounting, but unfortunately the /proc/uptime file hasn't caught up. Here the idle time calculation from /proc/stat is copied over. Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-23headers: utsname.h reduxAlexey Dobriyan
* remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Btrfs: fix releasepage to avoid unlocking extents we haven't lockedChris Mason
During releasepage, we try to drop any extent_state structs for the bye offsets of the page we're releaseing. But the code was incorrectly telling clear_extent_bit to delete the state struct unconditionallly. Normally this would be fine because we have the page locked, but other parts of btrfs will lock down an entire extent, the most common place being IO completion. releasepage was deleting the extent state without first locking the extent, which may result in removing a state struct that another process had locked down. The fix here is to leave the NODATASUM and EXTENT_LOCKED bits alone in releasepage. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-23Btrfs: Fix test_range_bit for whole file extentsChris Mason
If test_range_bit finds an extent that goes all the way to (u64)-1, it can incorrectly wrap the u64 instead of treaing it like the end of the address space. This just adds a check for the highest possible offset so we don't wrap. Signed-off-by: Chris Mason <chris.mason@oracle.com>