summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2011-03-11NFSv4.1: Fix the handling of the SEQUENCE status bitsTrond Myklebust
We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11NFSv4/4.1: Fix nfs4_schedule_state_recovery abusesTrond Myklebust
nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11jffs2: remove a trailing white space in commentariesTracey Dent
Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-03-11GFS2: introduce AIL lockDave Chinner
The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11GFS2: fix block allocation check for fallocateBenjamin Marzinski
GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11GFS2: Optimize glock multiple-dequeue codeBob Peterson
This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11UBIFS: do not check data crc by defaultArtem Bityutskiy
Change the default UBIFS behavior WRT data CRC checking. Currently, UBIFS checks data CRC when reading, which slows it down quite a bit, and this is the default option. However, it looks like in average user does not need this feature and would prefer faster read speed over extra reliability. And this seems to be de-facto standard that file-systems do not check data CRC every time they read from the media. Thus, make UBIFS default behavior so that it does not check data CRC. This corresponds to the no_chk_data_crc mount option. Those users who need extra protection can always enable it using the chk_data_crc option. Please, read more information about this feature here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-11UBIFS: simplify UBIFS Kconfig menuArtem Bityutskiy
Remove debug message level and debug checks Kconfig options as they proved to be useless anyway. We have sysfs interface which we can use for fine-grained debugging messages and checks selection, see Documentation/filesystems/ubifs.txt for mode details. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-11UBIFS: print max. index node sizeArtem Bityutskiy
Improve debugging messages by printing the maximum index node size on mount. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-11UBIFS: handle allocation failures in UBIFS write pathMatthew L. Creech
Running kernel 2.6.37, my PPC-based device occasionally gets an order-2 allocation failure in UBIFS, which causes the root FS to become unwritable: kswapd0: page allocation failure. order:2, mode:0x4050 Call Trace: [c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable) [c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c [c787dd00] [c0061b98] __get_free_pages+0x20/0x50 [c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200 [c787dd50] [c00e82d4] do_writepage+0x94/0x198 [c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c [c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370 [c787de90] [c0068224] shrink_zone+0x2b4/0x2b8 [c787df00] [c0068854] kswapd+0x408/0x5d4 [c787dfb0] [c0037bcc] kthread+0x80/0x84 [c787dff0] [c000ef44] kernel_thread+0x4c/0x68 Similar problems were encountered last April by Tomasz Stanislawski: http://patchwork.ozlabs.org/patch/50965/ This patch implements Artem's suggested fix: fall back to a mutex-protected static buffer, allocated at mount time. I tested it by forcing execution down the failure path, and didn't see any ill effects. Artem: massaged the patch a little, improved it so that we'd not allocate the write reserve buffer when we are in R/O mode. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-10NFSv4.1 reclaim complete must wait for completionAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10NFSv4: remove duplicate clientid in struct nfs_clientAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAYRicardo Labiaga
Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit ↵Frank Filz
31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10nfs: fix compilation warningJovi Zhang
this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10nfs: add kmalloc return value check in decode_and_add_dsStanislav Fomichev
add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10nfs: close NFSv4 COMMIT vs. CLOSE raceJeff Layton
I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10SUNRPC: Close a race in __rpc_wait_for_completion_task()Trond Myklebust
Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10dlm: use alloc_workqueue functionDavid Teigland
Replaces deprecated create_singlethread_workqueue(). Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10dlm: increase default hash table sizesDavid Teigland
Make all three hash tables a consistent size of 1024 rather than 1024, 512, 256. All three tables, for resources, locks, and lock dir entries, will generally be filled to the same order of magnitude. Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10dlm: record full callback stateDavid Teigland
Change how callbacks are recorded for locks. Previously, information about multiple callbacks was combined into a couple of variables that indicated what the end result should be. In some situations, we could not tell from this combined state what the exact sequence of callbacks were, and would end up either delivering the callbacks in the wrong order, or suppress redundant callbacks incorrectly. This new approach records all the data for each callback, leaving no uncertainty about what needs to be delivered. Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10btrfs: fix not enough reserved spaceMiao Xie
btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-10btrfs: fix dip leakDaniel J Blueman
The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-10fs/dcache: allow d_obtain_alias() to return unhashed dentriesJ. Bruce Fields
Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle.  The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10Check for immutable/append flag in fallocate pathMarco Stornelli
In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10fat: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10jfs: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10ocfs2: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10gfs2: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10fuse: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10ceph: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10reiserfs xattr ->d_revalidate() shouldn't care about RCUAl Viro
... it returns an error unconditionally Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10/proc/self is never going to be invalidated...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: kill off REQ_UNPLUGJens Axboe
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10aio: remove request submission batchingJens Axboe
This should be useless now that we have on-stack plugging. So lets just kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10fs: make aio plugShaohua Li
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10fs: make mpage read/write_pages() plugJens Axboe
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-09Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: wrong index used in inner loop nfsd4: fix bad pointer on failure to find delegation NFSD: fix decode_cb_sequence4resok
2011-03-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: nd->inode is not set on the second attempt in path_walk() unfuck proc_sysctl ->d_compare() minimal fix for do_filp_open() race
2011-03-09block: Don't check events while open is in progressTejun Heo
Not all block drivers clear events immediately after reporting. Some do so in ->revalidate_disk() or other steps during ->open(). There is a slim chance event poll may happen between the clearing event check from check_disk_change() and the actual clearing of the events which would result in spurious events. Block event checks while block device open is in progress. There is no need to kick explicit event check afterwards as events are always checked during open. -v2: The original patch could have called disk_unblock_events() with an already released or %NULL @disk causing oops. Fixed by making sure references are put after disk_unblock_events() is called. It also makes the error path of __blkdev_get() a bit simpler. This problem was reported by Jens. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09block: Don't check events on close unless it was blockedTejun Heo
The block event mechanism currently always checks events when the device is being closed regardless of the open mode. The intention was to allow detection of EJECT_REQUEST when a device is closed whether disk event polling is enabled or not. This is unnecessary as, for devices of interest, events are checked from either userland or kernel and in the former case ->check_events() is performed on open of each poll attempt anyway. Furthermore, this unconditional event check on close makes the code susceptible to event loop if the block driver doesn't clear reported events correctly - an event triggers userland to open and close the device which in turn causes another event, rinse and repeat. Check events on close only if it was blocked by excl write open. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09block: Don't implicitly trigger event check on disk_unblock_events()Tejun Heo
Currently, disk_unblock_events() implicitly kick event check if the block count reaches zero. This behavior is not described in the comment and hinders with future changes. Make the unblocker explicitly check events by calling disk_check_events() as necessary. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09xfs: factor agf counter updates into a helperChristoph Hellwig
Updating the AGF and transactions counters is duplicated between allocating and freeing extents. Factor the code into a common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-03-09xfs: clean up the xfs_alloc_compute_aligned calling conventionChristoph Hellwig
Pass a xfs_alloc_arg structure to xfs_alloc_compute_aligned and derive the alignment and minlen paramters from it. This cleans up the existing callers, and we'll need even more information from the xfs_alloc_arg in subsequent patches. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-03-09GFS2: Remove potential race in flock codeSteven Whitehouse
This patch ensures that we always wait for glock demotion when dropping flocks on a file in order to prevent any race conditions associated with further flock calls or closing the file. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09GFS2: Fix glock deallocation raceSteven Whitehouse
This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09GFS2: quota allows exceeding hard limitAbhijith Das
Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09nilfs2: get rid of nilfs_sb_info structureRyusuke Konishi
This directly uses sb->s_fs_info to keep a nilfs filesystem object and fully removes the intermediate nilfs_sb_info structure. With this change, the hierarchy of on-memory structures of nilfs will be simplified as follows: Before: super_block -> nilfs_sb_info -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) After: super_block -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) The reason why we didn't design so from the beginning is because the initial shape also differed from the above. The early hierachy was composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a shared nilfs object. On the kernel 2.6.37, it was changed to the current shape in order to unify super block instances into one per device, and this cleanup became applicable as the result. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>