summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2010-01-28Btrfs: check total number of devices when removing missingJosef Bacik
If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: check return value of open_bdev_exclusive properlyJosef Bacik
Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: do not mark the chunk as readonly if in degraded modeJosef Bacik
If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: run orphan cleanup on default fs rootJosef Bacik
This patch revert's commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: fix a memory leak in btrfs_init_aclYang Hongyang
In btrfs_init_acl() cloned acl is not released Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Use correct values when updating inode i_size on fallocateAneesh Kumar K.V
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: remove tree_search() in extent_map.cMiao Xie
This patch removes tree_search() in extent_map.c because it is not called by anything. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Add mount -o compress-forceChris Mason
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28block: fix bio_add_page for non trivial merge_bvec_fn caseDmitry Monakhov
We have to properly decrease bi_size in order to merge_bvec_fn return right result. Otherwise this result in false merge rejects for two absolutely valid bio_vecs. This may cause significant performance penalty for example fs_block_size == 1k and block device is raid0 with small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in to bio which already has 6 fs-blocks. Cc: <stable@kernel.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-28reiserfs: Fix vmalloc call under reiserfs lockFrederic Weisbecker
Vmalloc is called to allocate journal->j_cnode_free_list but we hold the reiserfs lock at this time, which raises a {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion. Just drop the reiserfs lock at this time, as it's not even needed but kept for paranoid reasons. This fixes: [ INFO: inconsistent lock state ] 2.6.33-rc5 #1 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 {RECLAIM_FS-ON-W} state was registered at: [<c104ee32>] mark_held_locks+0x62/0x90 [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0 [<c108f7b6>] kmem_cache_alloc+0x26/0xf0 [<c108621c>] __get_vm_area_node+0x6c/0xf0 [<c108690e>] __vmalloc_node+0x7e/0xa0 [<c1086aab>] vmalloc+0x2b/0x30 [<c110e1fb>] journal_init+0x6cb/0xa10 [<c10f90a2>] reiserfs_fill_super+0x342/0xb80 [<c1095665>] get_sb_bdev+0x145/0x180 [<c10f68e1>] get_super_block+0x21/0x30 [<c1094520>] vfs_kern_mount+0x40/0xd0 [<c1094609>] do_kern_mount+0x39/0xd0 [<c10aaa97>] do_mount+0x2c7/0x6d0 [<c10aaf06>] sys_mount+0x66/0xa0 [<c16198a7>] mount_block_root+0xc4/0x245 [<c1619a81>] mount_root+0x59/0x5f [<c1619b98>] prepare_namespace+0x111/0x14b [<c1619269>] kernel_init+0xcf/0xdb [<c100303a>] kernel_thread_helper+0x6/0x1c irq event stamp: 63236801 hardirqs last enabled at (63236801): [<c134e7fa>] __mutex_unlock_slowpath+0x9a/0x120 hardirqs last disabled at (63236800): [<c134e799>] __mutex_unlock_slowpath+0x39/0x120 softirqs last enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110 softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60 other info that might help us debug this: 2 locks held by kswapd0/313: #0: (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170 #1: (&type->s_umount_key#19){++++..}, at: [<c10a2edd>] shrink_dcache_memory+0xfd/0x1a0 stack backtrace: Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1 Call Trace: [<c134db2c>] ? printk+0x18/0x1c [<c104e7ef>] print_usage_bug+0x15f/0x1a0 [<c104ebcf>] mark_lock+0x39f/0x5a0 [<c104d66b>] ? trace_hardirqs_off+0xb/0x10 [<c1052c50>] ? check_usage_forwards+0x0/0xf0 [<c1050c24>] __lock_acquire+0x214/0xa70 [<c10438c5>] ? sched_clock_cpu+0x95/0x110 [<c10514fa>] lock_acquire+0x7a/0xa0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c134f03f>] mutex_lock_nested+0x5f/0x2b0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 [<c10f05b0>] reiserfs_delete_inode+0x50/0x140 [<c10a653f>] ? generic_delete_inode+0x5f/0x150 [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140 [<c10a657c>] generic_delete_inode+0x9c/0x150 [<c10a666d>] generic_drop_inode+0x3d/0x60 [<c10a5597>] iput+0x47/0x50 [<c10a2a4f>] dentry_iput+0x6f/0xf0 [<c10a2af4>] d_kill+0x24/0x50 [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0 [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0 [<c1074c9e>] shrink_slab+0x10e/0x170 [<c1075177>] kswapd+0x477/0x6a0 [<c1072d10>] ? isolate_pages_global+0x0/0x1b0 [<c103e160>] ? autoremove_wake_function+0x0/0x40 [<c1074d00>] ? kswapd+0x0/0x6a0 [<c103de6c>] kthread+0x6c/0x80 [<c103de00>] ? kthread+0x0/0x80 [<c100303a>] kernel_thread_helper+0x6/0x1c Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Chris Mason <chris.mason@oracle.com>
2010-01-27NFSD: Create PF_INET6 listener in write_portsChuck Lever
Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the kernel. Make sure nfsd_serv's reference count is decreased if __write_ports_addxprt() failed to create a listener. See __write_ports_addfd(). Our current plan is to rely on rpc.nfsd to create appropriate IPv6 listeners when server-side NFS/IPv6 support is desired. Legacy behavior, via the write_threads or write_svc kernel APIs, will remain the same -- only IPv4 listeners are created. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [bfields@citi.umich.edu: Move error-handling code to end] Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26fix oops in fs/9p late mount failureAl Viro
if 9P ->get_sb() fails late (at root inode or root dentry allocation), we'll hit its ->kill_sb() with NULL ->s_root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix leak in romfs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26get rid of pointless checks after simple_pin_fs()Al Viro
if we'd just got success from it, vfsmount won't be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix failure exits in bfs_fill_super()Al Viro
double iput(), leaks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix affs parse_options()Al Viro
Error handling in that sucker got broken back in 2003. If function returns 0 on failure, it's not nice to add return -EINVAL into it. Adding return 1 on other failure exits is also not a good thing (and yes, original success exits with 1 and some of failure exits with 0 are still there; so's the original logics in callers). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix remount races with symlink handling in affsAl Viro
A couple of fields in affs_sb_info is used in follow_link() and symlink() for handling AFFS "absolute" symlinks. Need locking against affs_remount() updates. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix a leak in affs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fnctl: f_modown should call write_lock_irqsave/restoreGreg Kroah-Hartman
Commit 703625118069f9f8960d356676662d3db5a9d116 exposed that f_modown() should call write_lock_irqsave instead of just write_lock_irq so that because a caller could have a spinlock held and it would not be good to renable interrupts. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tavis Ormandy <taviso@google.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-26SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"Chuck Lever
write_ports() converts svc_create_xprt()'s ENOENT error return to EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error message that makes sense. It turns out that several of the other kernel APIs rpc.nfsd use can also return ENOENT from svc_create_xprt(), by way of lockd_up(). On the client side, an NFSv2 or NFSv3 mount request can also return the result of lockd_up(). This error may also be returned during an NFSv4 mount request, since the NFSv4 callback service uses svc_create_xprt() to create the callback listener. An ENOENT error return results in a confusing error message from the mount command. Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()Chuck Lever
Clean up: Bruce observed we have more or less common logic in each of svc_create_xprt()'s callers: the check to create an IPv6 RPC listener socket only if CONFIG_IPV6 is set. I'm about to add another case that does just the same. If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt() call sites can get rid of the "#ifdef" ugliness, and can use the same logic with or without IPv6 support available in the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctlyTrond Myklebust
Even if the server is crazy, we should be able to mark the stateid as being bad, to ensure it gets recovered. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarilyTrond Myklebust
Currently, nfs4_handle_exception() will call it twice if called with an error of -NFS4ERR_STALE_CLIENTID, -NFS4ERR_STALE_STATEID or -NFS4ERR_EXPIRED. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4: Don't allow posix locking against servers that don't support itTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4: Ensure that the NFSv4 locking can recover from stateid errorsTrond Myklebust
In most cases, we just want to mark the lock_stateid sequence id as being uninitialised. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Avoid warnings when CONFIG_NFS_V4=nDavid Howells
Avoid the following warnings when CONFIG_NFS_V4=n: fs/nfs/sysctl.c:19: warning: unused variable `nfs_set_port_max' fs/nfs/sysctl.c:18: warning: unused variable `nfs_set_port_min' by making those variables contingent on NFSv4 being configured. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Make nfs_commitdata_release staticH Hartley Sweeten
The symbol nfs_commitdata_release is only used locally in this file. Make it static to prevent the following sparse warning: warning: symbol 'nfs_commitdata_release' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Try to commit unstable writes in nfs_release_page()Trond Myklebust
If someone calls nfs_release_page(), we presumably already know that the page is clean, however it may be holding an unstable write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Fix a reference leak in nfs_wb_cancel_page()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26xfs: Use delay write promotion for dquot flushingDave Chinner
xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item pushing used to do to flush out delayed write dquot buffers. Change it to use the new promotion method rather than an async flush. Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock held, yet the callers make the assumption that after this call the flush lock is held. Always return with the flush lock held. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26xfs: Sort delayed write buffers before dispatchDave Chinner
Currently when the xfsbufd writes delayed write buffers, it pushes them to disk in the order they come off the delayed write list. If there are lots of buffers ѕpread widely over the disk, this results in overwhelming the elevator sort queues in the block layer and we end up losing the posibility of merging adjacent buffers to minimise the number of IOs. Use the new generic list_sort function to sort the delwri dispatch queue before issue to ensure that the buffers are pushed in the most friendly order possible to the lower layers. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-02xfs: Don't issue buffer IO direct from AIL push V2Dave Chinner
All buffers logged into the AIL are marked as delayed write. When the AIL needs to push the buffer out, it issues an async write of the buffer. This means that IO patterns are dependent on the order of buffers in the AIL. Instead of flushing the buffer, promote the buffer in the delayed write list so that the next time the xfsbufd is run the buffer will be flushed by the xfsbufd. Return the state to the xfsaild that the buffer was promoted so that the xfsaild knows that it needs to cause the xfsbufd to run to flush the buffers that were promoted. Using the xfsbufd for issuing the IO allows us to dispatch all buffer IO from the one queue. This means that we can make much more enlightened decisions on what order to flush buffers to disk as we don't have multiple places issuing IO. Optimisations to xfsbufd will be in a future patch. Version 2 - kill XFS_ITEM_FLUSHING as it is now unused. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06xfs: Use delayed write for inodes rather than async V2Dave Chinner
We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06xfs: Make inode reclaim states explicitDave Chinner
A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-08xfs: more reserved blocks fixupsEric Sandeen
This mangles the reserved blocks counts a little more. 1) add a helper function for the default reserved count 2) add helper functions to save/restore counts on ro/rw 3) save/restore reserved blocks on freeze/thaw 4) disallow changing reserved count while readonly V2: changed field name to match Dave's changes Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-26xfs: turn off sign warningsDave Chinner
Because they cause warnings in static inline functions conditionally compiled into XFS from the VFS (e.g. fsnotify). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26xfs: don't hold onto reserved blocks on remount,roDave Chinner
If we hold onto reserved blocks when doing a remount,ro we end up writing the blocks used count to disk that includes the reserved blocks. Reserved blocks are not actually used, so this results in the values in the superblock being incorrect. Hence if we run xfs_check or xfs_repair -n while the filesystem is mounted remount,ro we end up with an inconsistent filesystem being reported. Also, running xfs_copy on the remount,ro filesystem will result in an inconsistent image being generated. To fix this, unreserve the blocks when doing the remount,ro, and reserved them again on remount,rw. This way a remount,ro filesystem will appear consistent on disk to all utilities. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-25ocfs2/dlm: Print more messages during lock migrationSunil Mushran
When a lock resource is migrated, the dlm compares the migrated locks with that that was already existing on the new node. If the comparison fails, it BUGs. This patch prints more messages when the comparison fails inorder to help with the root cause analyis. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1206 This does not fix bz1206. However, if we run into it again, we will have more information to chew on. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/dlm: Ignore LVBs of locks in the Blocked listSunil Mushran
During lock resource migration, o2dlm fills the packet with a LVB from the first valid lock. For sanity, it ensures that the other valid locks have the same LVB. If not, it BUGs. The valid locks are ones that have granted EX or PR lock levels and are either on the Granted or Converting lists. Locks in the Blocked list cannot have a valid LVB. This patch ensures that we skip the locks in the Blocked list. Fixes oss bugzilla#1202 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1202 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/trivial: Remove trailing whitespacesSunil Mushran
Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2: fix a misleading variable nameWengang Wang
a local variable "dlm_version" is used as a fs locking version. rename it fs_version. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2: Sync max_inline_data_with_xattr from tools.Tao Ma
In ocfs2-tools, we have added ocfs2_max_inline_data_with_xattr, so add it in the kernel's ocfs2_fs.h. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag ext4: Fix quota accounting error with fallocate ext4: Handle -EDQUOT error on write
2010-01-25ceph: precede encoded ceph_pg_pool struct with versionSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-01-25ceph: keep reserved replies on the request structureYehuda Sadeh
This includes treating all the data preallocation and revokation at the same place, not having to have a special case for the reserved pages. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: alloc message data pages and check if tid existsYehuda Sadeh
Now doing it in the same callback that is also responsible for allocating the 'front' part of the message. If we get a message that we haven't got a corresponding tid for, mark it for skipping. Moving the mutex unlock/lock from the osd alloc_msg callback to the calling function in the messenger. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: refactor messages data section allocationYehuda Sadeh
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: allocate middle of message before stating to readYehuda Sadeh
Both front and middle parts of the message are now being allocated at the ceph_alloc_msg(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: properly handle aborted mds requestsSage Weil
Previously, if the MDS request was interrupted, we would unregister the request and ignore any reply. This could cause the caps or other cache state to become out of sync. (For instance, aborting dbench and doing rm -r on clients would complain about a non-empty directory because the client didn't realize it's aborted file create request completed.) Even we don't unregister, we still can't process the reply normally because we are no longer holding the caller's locks (like the dir i_mutex). So, mark aborted operations with r_aborted, and in the reply handler, be sure to process all the caps. Do not process the namespace changes, though, since we no longer will hold the dir i_mutex. The dentry lease state can also be ignored as it's more forgiving. Signed-off-by: Sage Weil <sage@newdream.net>
2010-01-25ceph: mark MDS CREATE as a write opSage Weil
CEPH_MDS_OP_CREATE was not correctly marked as a write operation. Signed-off-by: Sage Weil <sage@newdream.net>