summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2007-10-10[GFS2] fixed a NULL pointer assignment BUGDenis Cheng
Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Force unstuff of hidden quota inodeAbhijith Das
This patch forcibly unstuffs (if stuffed) the hidden quota inode at the first availble opportunity. In any practical scenario the quota inode won't be stuffed, so this is ok to do. Unstuffing the quota inode allows us to ignore the case of a stuffed quota inode in gfs2_adjust_quota(). Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] better code for translating charactersDenis Cheng
the original code could work, but I think this code could work better. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] unneeded typecastDenis Cheng
sb->s_fs_info is a void pointer, thus the type cast is not needed. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] use list_for_each_entry insteadDenis Cheng
Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Ensure journal file cache is flushed after recoveryBob Peterson
This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. I'm continuing to test it. This is a complete rewrite of patch 5 for bug #248176, written by Steve Whitehouse. This is referred to in the bugzilla record as "new 6" and "a different solution". The problem was that the journal inodes, although protected by a glock, were not synched with the other nodes because they don't use the inode glock synch operations (i.e. no "glops" were defined). Therefore, journal recovery on a journal-recovering node were causing the blocks to get out of sync with the node that was actually trying to use that journal as it comes back up from a reboot. There are two possible solutions: (1) To make the journals use the normal inode glock sync operations, or (2) To make the journal operations take effect immediately (i.e. no caching). Although option 1 works, it turns out to be a lot more code. Steve opted for option 2, which is much simpler and therefore less prone to regression errors. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> --
2007-10-10[GFS2] invalid metadata block - REVISEDBob Peterson
This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. This is a complete rewrite of patch 4 for bug #248176. Part of the problem was that inodes were being recycled before their buffers were flushed to the journal logs. Another problem was that the clone bitmaps were being searched for deleted inodes to recycle, but only the "real" bitmaps should be searched for that purpose. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Reduce number of gfs2_scand processes to oneSteven Whitehouse
We only need a single gfs2_scand process rather than the one per filesystem which we had previously. As a result the parameter determining the frequency of gfs2_scand runs becomes a module parameter rather than a mount parameter as it was before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] use the declaration of gfs2_dops in the header file insteadDenis Cheng
Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] mark struct *_operations constDenis Cheng
these struct *_operations are all method tables, thus should be const. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Detach buf data during in-place writebackBob Peterson
This is patch 5 of 5 for bug #248176 Metadata corruption was occurring because page references weren't being removed in all cases. I previously added a function called detach_bufdata, but I discovered there already WAS a function out there to do the job. It's called gfs2_meta_cache_flush. So I added a call to that to remove the page references. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] use an temp variable to reduce a spin_unlockDenis Cheng
this is more clear. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Prevent infinite loop in try_rgrp_unlink()Bob Peterson
This is patch three of five for bug #248176. The try_rgrp_unlink code in rgrp.c had an infinite loop. This was caused because the bitmap function rgblk_search can return a block less than the "goal" block, in which case it was looping. The fix is to make it always march forward as needed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Revert part of earlier log.c changesBob Peterson
This is patch 2 of 5 for bug #248176. The list_move code previously concocted in log.c for bug #238162 (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) never runs as bh can now never be NULL at this point. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Move some code inside the log lockBob Peterson
This is the first of five patches for bug #248176: There were still some critical variables being manipulated outside the log_lock spinlock. That usually resulted in a hang. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Fix an oops in glock dumpingSteven Whitehouse
This fixes an oops which was occurring during glock dumping due to the seq file code not taking a reference to the glock. Also this fixes a memory leak which occurred in certain cases, in turn preventing the filesystem from unmounting. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] GFS2 not checking pointer on create when running under nfsdSteve French
When looking at an unrelated problem, I noticed that nfsd does not set nameidata pointer on create (ie nd is NULL). This should cause an oops in some cases in which when NFSd is mounted over GFS2. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Clean up duplicate includes in fs/gfs2/Jesper Juhl
This patch cleans up duplicate includes in fs/gfs2/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Fix calculation of demote stateJosef Whiter
If a glock is in the exclusive state and a request for demote to deferred has been received, then further requests for demote to shared are being ignored. This patch fixes that by ensuring that we demote to unlocked in that case. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Fix two races relating to glock callbacksSteven Whitehouse
One of the races relates to referencing a variable while not holding its protecting spinlock. The patch simply moves the test inside the spin lock. The other races occurs when a demote to unlocked request occurs during the time a demote to shared request is already running. This of course only happens in the case that the lock was in the exclusive mode to start with. The patch adds a check to see if another demote request has occurred in the mean time and if it has, then it performs a second demote. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10compat_ioctl: move floppy handlers to block/compat_ioctl.cArnd Bergmann
The floppy ioctls are used by multiple drivers, so they should be handled in a shared location. Also, add minor cleanups. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: move cdrom handlers to block/compat_ioctl.cArnd Bergmann
These are shared by all cd-rom drivers and should have common handlers. Do slight cosmetic cleanups in the process. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: move BLKPG handling to block/compat_ioctl.cArnd Bergmann
BLKPG is common to all block devices, so it should be handled by common code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: move hdio calls to block/compat_ioctl.cArnd Bergmann
These are common to multiple block drivers, so they should be handled by the block layer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: handle blk_trace ioctlsArnd Bergmann
blk_trace_setup is broken on x86_64 compat systems, this makes the code work correctly on all 64 bit architectures in compat mode. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: add compat_blkdev_driver_ioctl()Arnd Bergmann
Handle those blockdev ioctl calls that are compatible directly from the compat_blkdev_ioctl() function, instead of having to go through the compat_ioctl hash lookup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10compat_ioctl: move common block ioctls to compat_blkdev_ioctlArnd Bergmann
Make compat_blkdev_ioctl and blkdev_ioctl reflect the respective native versions. This is somewhat more efficient and makes it easier to keep the two in sync. Also get rid of the bogus handling for broken_blkgetsize and the duplicate entry for BLKRASET. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10Drop 'size' argument from bio_endio and bi_end_ioNeilBrown
As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10Don't decrement bi_size in bio_endioNeilBrown
The only caller of bio_endio that does not pass the full bi_size is end_that_request_first. Also, no ->bi_end_io method is really interested in bi_size being decremented. So move the decrement and related code into ll_rw_blk and merge it with order_bio_endio to form req_bio_endio which does endio functionality specific to request completion. As some ->bi_end_io methods do check bi_size of 0, we set it thus for now, but that will go in the next patch. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./block/ll_rw_blk.c | 42 +++++++++++++++++++++++++++--------------- ./fs/bio.c | 23 +++++++++++------------ 2 files changed, 38 insertions(+), 27 deletions(-) diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10Only call bi_end_io once for any bioNeilBrown
Currently bi_end_io can be called multiple times as sub-requests complete. However no ->bi_end_io function wants to know about that. So only call when the bio is complete. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/bio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff .prev/fs/bio.c ./fs/bio.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10Fix warnings with !CONFIG_BLOCKJens Axboe
Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup the (few) files that fail to build because they were relying on blkdev.h pulling in extra includes for them. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-09nfsd: remove IS_ISMNDLCK macroJ. Bruce Fields
This macro is only used in one place; in this place it seems simpler to put open-code it and move the comment to where it's used. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09Rework /proc/locks via seq_files and seq_list helpersPavel Emelyanov
Currently /proc/locks is shown with a proc_read function, but its behavior is rather complex as it has to manually handle current offset and buffer length. On the other hand, files that show objects from lists can be easily reimplemented using the sequential files and the seq_list_XXX() helpers. This saves (as usually) 16 lines of code and more than 200 from the .text section. [akpm@linux-foundation.org: no externs in C] [akpm@linux-foundation.org: warning fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09fs/locks.c: use list_for_each_entry() instead of list_for_each()Matthias Kaehlcke
fs/locks.c: use list_for_each_entry() instead of list_for_each() in posix_locks_deadlock() and get_locks_status() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09NFS: clean up explicit check for mandatory locksPavel Emelyanov
The __mandatory_lock(inode) macro makes the same check, but makes the code more readable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09AFS: clean up explicit check for mandatory locksPavel Emelyanov
The __mandatory_lock(inode) macro makes the same check, but makes the code more readable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-099PFS: clean up explicit check for mandatory locksPavel Emelyanov
The __mandatory_lock(inode) macro makes the same check, but makes the code more readable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09GFS2: clean up explicit check for mandatory locksPavel Emelyanov
The __mandatory_lock(inode) function makes the same check, but makes the code more readable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09Cleanup macros for distinguishing mandatory locksPavel Emelyanov
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the inode as "mandatory lockable" and there's a macro for this check called MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform the explicit i_mode checking. Besides, Andrew pointed out, that this macro is buggy itself, as it dereferences the inode arg twice. Convert this macro into static inline function and switch its users to it, making the code shorter and more readable. The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK() for superblock is already known to be true. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09locks: Fix potential OOPS in generic_setlease()Pavel Emelyanov
This code is run under lock_kernel(), which is dropped during sleeping operations, so the following race is possible: CPU1: CPU2: vfs_setlease(); vfs_setlease(); lock_kernel(); lock_kernel(); /* spin */ generic_setlease(): ... for (before = ...) /* here we found some lease after * which we will insert the new one */ fl = locks_alloc_lock(); /* go to sleep in this allocation and * drop the BKL */ generic_setlease(): ... for (before = ...) /* here we find the "before" pointing * at the one we found on CPU1 */ ->fl_change(my_before, arg); lease_modify(); locks_free_lock(); /* and we freed it */ ... unlock_kernel(); locks_insert_lock(before, fl); /* OOPS! We have just tried to add the lease * at the tail of already removed one */ The similar races are already handled in other code - all the allocations are performed before any checks/updates. Thanks to Kamalesh Babulal for testing and for a bug report on an earlier version. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2007-10-09Use list_first_entry in locks_wake_up_blocksPavel Emelyanov
This routine deletes all the elements from the list with the "while (!list_empty())" loop, and we already have a list_first_entry() macro to help it look nicer :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2007-10-09locks: fix flock_lock_file() commentJ. Bruce Fields
This comment wasn't updated when lease support was added, and it makes essentially the same mistake that the code made before a recent bugfix. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09Memory shortage can result in inconsistent flocks statePavel Emelyanov
When the flock_lock_file() is called to change the flock from F_RDLCK to F_WRLCK or vice versa the existing flock can be removed without appropriate warning. Look: for_each_lock(inode, before) { struct file_lock *fl = *before; if (IS_POSIX(fl)) break; if (IS_LEASE(fl)) continue; if (filp != fl->fl_file) continue; if (request->fl_type == fl->fl_type) goto out; found = 1; locks_delete_lock(before); <<<<<< ! break; } if after this point the subsequent locks_alloc_lock() will fail the return code will be -ENOMEM, but the existing lock is already removed. This is a known feature that such "re-locking" is not atomic, but in the racy case the file should stay locked (although by some other process), but in this case the file will be unlocked. The proposal is to prepare the lock in advance keeping no chance to fail in the future code. Found during making the flocks pid-namespaces aware. (Note: Thanks to Reuben Farrelly for finding a bug in an earlier version of this patch.) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Reuben Farrelly <reuben-linuxkernel@reub.net>
2007-10-09locks: kill redundant local variableJ. Bruce Fields
There's no need for another variable local to this loop; we can use the variable (of the same name!) already declared at the top of the function, and not used till later (at which point it's initialized, so this is safe). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09locks: reverse order of posix_locks_conflict() argumentsJ. Bruce Fields
The first argument to posix_locks_conflict() is meant to be a lock request, and the second a lock from an inode's lock request. It doesn't really make a difference which order you call them in, since the only asymmetric test in posix_lock_conflict() is the check whether the second argument is a posix lock--and every caller already does that check for some reason. But may as well fix posix_test_lock() to call posix_locks_conflict() with the arguments in the same order as everywhere else. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-10-09knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAMEJ. Bruce Fields
Without this we always return 2^32-1 as the the maximum namelength. Thanks to Andreas Gruenbacher for bug report and testing. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Andreas Gruenbacher <agruen@suse.de>
2007-10-09knfsd: nfsv4 delegation recall should take reference on clientJ. Bruce Fields
It's not enough to take a reference on the delegation object itself; we need to ensure that the rpc_client won't go away just as we're about to make an rpc call. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09knfsd: don't shutdown callbacks until nfsv4 client is freedJ. Bruce Fields
If a callback still holds a reference on the client, then it may be about to perform an rpc call, so it isn't safe to call rpc_shutdown(). (Though rpc_shutdown() does wait for any outstanding rpc's, it can't know if a new rpc is about to be issued with that client.) So, wait to shutdown the rpc_client until the reference count on the client has gone to zero. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09knfsd: let nfsd manage timing out its own leasesJ. Bruce Fields
Currently there's a race that can cause an oops in generic_setlease. (In detail: nfsd, when it removes a lease, does so by calling vfs_setlease() with F_UNLCK and a pointer to the fl_flock field, which in turn points to nfsd's existing lease; but the first thing the setlease code does is call time_out_leases(). If the lease happens to already be beyond the lease break time, that will free the lease and (in nfsd's release_private callback) set fl_flock to NULL, leading to a NULL deference soon after in vfs_setlease().) There are probably other things to fix here too, but it seems inherently racy to allow either locks.c or nfsd to time out this lease. Instead just set the fl_break_time to 0 (preventing locks.c from ever timing out this lock) and leave it up to nfsd's laundromat thread to deal with it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09knfsd: 64 bit ino support for NFS serverPeter Staubach
Modify the NFS server code to support 64 bit ino's, as appropriate for the system and the NFS protocol version. The gist of the changes is to query the underlying file system for attributes and not just to use the cached attributes in the inode. For this specific purpose, the inode only contains an ino field which unsigned long, which is large enough on 64 bit platforms, but is not large enough on 32 bit platforms. I haven't been able to find any reason why ->getattr can't be called while i_mutex. The specification indicates that i_mutex is not required to be held in order to invoke ->getattr, but it doesn't say that i_mutex can't be held while invoking ->getattr. I also haven't come to any conclusions regarding the value of lease_get_mtime() and whether it should or should not be invoked by fill_post_wcc() too. I chose not to change this because I thought that it was safer to leave well enough alone. If we decide to make a change, it can be done separately. Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de>