Age | Commit message (Collapse) | Author |
|
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch forcibly unstuffs (if stuffed) the hidden quota inode at the
first availble opportunity. In any practical scenario the quota inode
won't be stuffed, so this is ok to do. Unstuffing the quota inode allows
us to ignore the case of a stuffed quota inode in gfs2_adjust_quota().
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
the original code could work, but I think this code could work better.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
sb->s_fs_info is a void pointer, thus the type cast is not needed.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This is for bugzilla bug #248176: GFS2: invalid metadata block
Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5. Those issues have been resolved and now the recovery
tests are passing without errors. This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem. I'm continuing to test it.
This is a complete rewrite of patch 5 for bug #248176, written by
Steve Whitehouse. This is referred to in the bugzilla record as
"new 6" and "a different solution".
The problem was that the journal inodes, although protected by
a glock, were not synched with the other nodes because they don't
use the inode glock synch operations (i.e. no "glops" were defined).
Therefore, journal recovery on a journal-recovering node were causing
the blocks to get out of sync with the node that was actually trying
to use that journal as it comes back up from a reboot.
There are two possible solutions: (1) To make the journals use the
normal inode glock sync operations, or (2) To make the journal
operations take effect immediately (i.e. no caching). Although
option 1 works, it turns out to be a lot more code. Steve opted
for option 2, which is much simpler and therefore less prone to
regression errors.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
--
|
|
This is for bugzilla bug #248176: GFS2: invalid metadata block
Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5. Those issues have been resolved and now the recovery
tests are passing without errors. This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem.
This is a complete rewrite of patch 4 for bug #248176.
Part of the problem was that inodes were being recycled
before their buffers were flushed to the journal logs.
Another problem was that the clone bitmaps were being
searched for deleted inodes to recycle, but only the
"real" bitmaps should be searched for that purpose.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
these struct *_operations are all method tables, thus should be const.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This is patch 5 of 5 for bug #248176
Metadata corruption was occurring because page references weren't
being removed in all cases. I previously added a function called
detach_bufdata, but I discovered there already WAS a function out
there to do the job. It's called gfs2_meta_cache_flush. So I added
a call to that to remove the page references.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
this is more clear.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This is patch three of five for bug #248176.
The try_rgrp_unlink code in rgrp.c had an infinite loop. This was
caused because the bitmap function rgblk_search can return a block
less than the "goal" block, in which case it was looping. The fix is
to make it always march forward as needed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This is patch 2 of 5 for bug #248176.
The list_move code previously concocted in log.c for bug #238162
(see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23)
never runs as bh can now never be NULL at this point.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This is the first of five patches for bug #248176:
There were still some critical variables being manipulated outside
the log_lock spinlock. That usually resulted in a hang.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This fixes an oops which was occurring during glock dumping due to the
seq file code not taking a reference to the glock. Also this fixes a
memory leak which occurred in certain cases, in turn preventing the
filesystem from unmounting.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
When looking at an unrelated problem, I noticed that nfsd does not
set nameidata pointer on create (ie nd is NULL). This should
cause an oops in some cases in which when NFSd is mounted over GFS2.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch cleans up duplicate includes in
fs/gfs2/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case.
Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
One of the races relates to referencing a variable while not holding
its protecting spinlock. The patch simply moves the test inside the
spin lock. The other races occurs when a demote to unlocked request
occurs during the time a demote to shared request is already running.
This of course only happens in the case that the lock was in the
exclusive mode to start with. The patch adds a check to see if another
demote request has occurred in the mean time and if it has, then it
performs a second demote.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The floppy ioctls are used by multiple drivers, so they should be
handled in a shared location. Also, add minor cleanups.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
These are shared by all cd-rom drivers and should have common
handlers. Do slight cosmetic cleanups in the process.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
BLKPG is common to all block devices, so it should be handled
by common code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
These are common to multiple block drivers, so they should
be handled by the block layer.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
blk_trace_setup is broken on x86_64 compat systems,
this makes the code work correctly on all 64 bit architectures
in compat mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Handle those blockdev ioctl calls that are compatible
directly from the compat_blkdev_ioctl() function, instead
of having to go through the compat_ioctl hash lookup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Make compat_blkdev_ioctl and blkdev_ioctl reflect the respective
native versions. This is somewhat more efficient and makes it easier
to keep the two in sync.
Also get rid of the bogus handling for broken_blkgetsize and the
duplicate entry for BLKRASET.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
The only caller of bio_endio that does not pass the full bi_size
is end_that_request_first. Also, no ->bi_end_io method is really
interested in bi_size being decremented.
So move the decrement and related code into ll_rw_blk and merge it
with order_bio_endio to form req_bio_endio which does endio functionality
specific to request completion.
As some ->bi_end_io methods do check bi_size of 0, we set it thus for
now, but that will go in the next patch.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./block/ll_rw_blk.c | 42 +++++++++++++++++++++++++++---------------
./fs/bio.c | 23 +++++++++++------------
2 files changed, 38 insertions(+), 27 deletions(-)
diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Currently bi_end_io can be called multiple times as sub-requests
complete. However no ->bi_end_io function wants to know about that.
So only call when the bio is complete.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/bio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff .prev/fs/bio.c ./fs/bio.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup
the (few) files that fail to build because they were relying on blkdev.h
pulling in extra includes for them.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
This macro is only used in one place; in this place it seems simpler to
put open-code it and move the comment to where it's used.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Currently /proc/locks is shown with a proc_read function, but its behavior
is rather complex as it has to manually handle current offset and buffer
length. On the other hand, files that show objects from lists can be
easily reimplemented using the sequential files and the seq_list_XXX()
helpers.
This saves (as usually) 16 lines of code and more than 200 from
the .text section.
[akpm@linux-foundation.org: no externs in C]
[akpm@linux-foundation.org: warning fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
fs/locks.c: use list_for_each_entry() instead of list_for_each() in
posix_locks_deadlock() and get_locks_status()
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The __mandatory_lock(inode) macro makes the same check, but makes the code
more readable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The __mandatory_lock(inode) macro makes the same check, but makes the code
more readable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The __mandatory_lock(inode) macro makes the same check, but makes the code
more readable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The __mandatory_lock(inode) function makes the same check, but makes the code
more readable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.
Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.
The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This code is run under lock_kernel(), which is dropped during
sleeping operations, so the following race is possible:
CPU1: CPU2:
vfs_setlease(); vfs_setlease();
lock_kernel();
lock_kernel(); /* spin */
generic_setlease():
...
for (before = ...)
/* here we found some lease after
* which we will insert the new one
*/
fl = locks_alloc_lock();
/* go to sleep in this allocation and
* drop the BKL
*/
generic_setlease():
...
for (before = ...)
/* here we find the "before" pointing
* at the one we found on CPU1
*/
->fl_change(my_before, arg);
lease_modify();
locks_free_lock();
/* and we freed it */
...
unlock_kernel();
locks_insert_lock(before, fl);
/* OOPS! We have just tried to add the lease
* at the tail of already removed one
*/
The similar races are already handled in other code - all the
allocations are performed before any checks/updates.
Thanks to Kamalesh Babulal for testing and for a bug report on an
earlier version.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
|
|
This routine deletes all the elements from the list
with the "while (!list_empty())" loop, and we already
have a list_first_entry() macro to help it look nicer :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
|
|
This comment wasn't updated when lease support was added, and it makes
essentially the same mistake that the code made before a recent bugfix.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
When the flock_lock_file() is called to change the flock
from F_RDLCK to F_WRLCK or vice versa the existing flock
can be removed without appropriate warning.
Look:
for_each_lock(inode, before) {
struct file_lock *fl = *before;
if (IS_POSIX(fl))
break;
if (IS_LEASE(fl))
continue;
if (filp != fl->fl_file)
continue;
if (request->fl_type == fl->fl_type)
goto out;
found = 1;
locks_delete_lock(before); <<<<<< !
break;
}
if after this point the subsequent locks_alloc_lock() will
fail the return code will be -ENOMEM, but the existing lock
is already removed.
This is a known feature that such "re-locking" is not atomic,
but in the racy case the file should stay locked (although by
some other process), but in this case the file will be unlocked.
The proposal is to prepare the lock in advance keeping no chance
to fail in the future code.
Found during making the flocks pid-namespaces aware.
(Note: Thanks to Reuben Farrelly for finding a bug in an earlier version
of this patch.)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Reuben Farrelly <reuben-linuxkernel@reub.net>
|
|
There's no need for another variable local to this loop; we can use the
variable (of the same name!) already declared at the top of the function,
and not used till later (at which point it's initialized, so this is safe).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The first argument to posix_locks_conflict() is meant to be a lock request,
and the second a lock from an inode's lock request. It doesn't really
make a difference which order you call them in, since the only
asymmetric test in posix_lock_conflict() is the check whether the second
argument is a posix lock--and every caller already does that check for
some reason.
But may as well fix posix_test_lock() to call posix_locks_conflict()
with the arguments in the same order as everywhere else.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
Without this we always return 2^32-1 as the the maximum namelength.
Thanks to Andreas Gruenbacher for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Andreas Gruenbacher <agruen@suse.de>
|
|
It's not enough to take a reference on the delegation object itself; we
need to ensure that the rpc_client won't go away just as we're about to
make an rpc call.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
If a callback still holds a reference on the client, then it may be
about to perform an rpc call, so it isn't safe to call rpc_shutdown().
(Though rpc_shutdown() does wait for any outstanding rpc's, it can't
know if a new rpc is about to be issued with that client.)
So, wait to shutdown the rpc_client until the reference count on the
client has gone to zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Currently there's a race that can cause an oops in generic_setlease.
(In detail: nfsd, when it removes a lease, does so by calling
vfs_setlease() with F_UNLCK and a pointer to the fl_flock field, which
in turn points to nfsd's existing lease; but the first thing the
setlease code does is call time_out_leases(). If the lease happens to
already be beyond the lease break time, that will free the lease and (in
nfsd's release_private callback) set fl_flock to NULL, leading to a NULL
deference soon after in vfs_setlease().)
There are probably other things to fix here too, but it seems inherently
racy to allow either locks.c or nfsd to time out this lease. Instead
just set the fl_break_time to 0 (preventing locks.c from ever timing out
this lock) and leave it up to nfsd's laundromat thread to deal with it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Modify the NFS server code to support 64 bit ino's, as
appropriate for the system and the NFS protocol version.
The gist of the changes is to query the underlying file system
for attributes and not just to use the cached attributes in the
inode. For this specific purpose, the inode only contains an
ino field which unsigned long, which is large enough on 64 bit
platforms, but is not large enough on 32 bit platforms.
I haven't been able to find any reason why ->getattr can't be called
while i_mutex. The specification indicates that i_mutex is not
required to be held in order to invoke ->getattr, but it doesn't say
that i_mutex can't be held while invoking ->getattr.
I also haven't come to any conclusions regarding the value of
lease_get_mtime() and whether it should or should not be invoked
by fill_post_wcc() too. I chose not to change this because I
thought that it was safer to leave well enough alone. If we
decide to make a change, it can be done separately.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
|