summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2009-07-27Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
* 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: use GFP_NOFS under potential memory pressure fsnotify: fix inotify tail drop check with path entries inotify: check filename before dropping repeat events fsnotify: use def_bool in kconfig instead of letting the user choose inotify: fix error paths in inotify_update_watch inotify: do not leak inode marks in inotify_add_watch inotify: drop user watch count when a watch is removed
2009-07-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: Fix early release of acl in jfs_get_acl
2009-07-27Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: fix race between write_metadata_buffer and get_write_access ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() jbd: Fix a race between checkpointing code and journal_get_write_access() ext3: Fix truncation of symlinks after failed write jbd: Fail to load a journal if it is too short
2009-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] fix sparse warning cifs: fix sb->s_maxbytes so that it casts properly to a signed value cifs: disable serverino if server doesn't support it
2009-07-27Btrfs: change how we unpin extentsJosef Bacik
We are racy with async block caching and unpinning extents. This patch makes things much less complicated by only unpinning the extent if the block group is cached. We check the block_group->cached var under the block_group->lock spin lock. If it is set to BTRFS_CACHE_FINISHED then we update the pinned counters, and unpin the extent and add the free space back. If it is not set to this, we start the caching of the block group so the next time we unpin extents we can unpin the extent. This keeps us from racing with the async caching threads, lets us kill the fs wide async thread counter, and keeps us from having to set DELALLOC bits for every extent we hit if there are caching kthreads going. One thing that needed to be changed was btrfs_free_super_mirror_extents. Now instead of just looking for LOCKED extents, we also look for DIRTY extents, since we could have left some extents pinned in the previous transaction that will never get freed now that we are unmounting, which would cause us to leak memory. So btrfs_free_super_mirror_extents has been changed to btrfs_free_pinned_extents, and it will clear the extents locked for the super mirror, and any remaining pinned extents that may be present. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-27Btrfs: Correct redundant test in add_inode_refJulia Lawall
dir has already been tested. It seems that this test should be on the recently returned value inode. A simplified version of the semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: find smallest available device extent during chunk allocationChris Mason
Allocating new block group is easy when the disk has plenty of space. But things get difficult as the disk fills up, especially if the FS has been run through btrfs-vol -b. The balance operation is likely to make the total bytes available on the device greater than the largest extent we'll actually be able to allocate. But the device extent allocation code incorrectly assumes that a device with 5G free will be able to allocate a 5G extent. It isn't normally a problem because device extents don't get freed unless btrfs-vol -b is run. This fixes the device extent allocator to remember the largest free extent it can find, and then uses that value as a fallback. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: clear all space_info->full after removing a block groupChris Mason
Btrfs allocates individual extents from block groups, and each block group has a specific type. It may hold metadata, data mirrored or striped etc. When we balance space (btrfs-vol -b) or remove a drive (btrfs-vol -r) we free block groups. Once a block group is freed, the space it was using on the device may be available for use by new block groups. btrfs_remove_block_group was clearing the flag that said 'our devices are full, don't even try to allocate new block groups', but it was only clearing that flag for a specific type of block group. This commit clears the full flag for all of the types of block groups, making it much more likely that we'll be able to balance space when the drive is close to full. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: make flushoncommit mount option correctly wait on ordered_extentsSage Weil
The commit_transaction call to wait_ordered_extents when snap_pending passes nocow_only=1 to process only NOCOW or PREALLOC extents. This isn't correct for the 'flushoncommit' mode, as it skips extents we just started IO on in start_delalloc_inodes. So, in the flushoncommit case, wait on all ordered extents. Otherwise, only pass the nocow_only flag to wait_ordered_extents if snap_pending. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: Avoid delayed reference update loopingYan Zheng
btrfs_split_leaf and btrfs_del_items can end up in a loop where one is constantly spliting a given leaf and the other is constantly merging it back with the adjacent nodes. There is a better fix for this, but in the interest of something small, this patch just changes btrfs_del_items back to balancing less often. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: Fix ordering of key field checks in btrfs_previous_itemYan Zheng
Check objectid of item before checking the item type, otherwise we may return zero for a key that is actually too low. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: find_free_dev_extent doesn't handle holes at the start of the deviceYan Zheng
find_free_dev_extent does not properly handle the case where the device is not complete free, and there is a free extent at the beginning of the device. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: Remove code duplication in comp_keysDiego Calleja
comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just call it. Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: async block group cachingJosef Bacik
This patch moves the caching of the block group off to a kthread in order to allow people to allocate sooner. Instead of blocking up behind the caching mutex, we instead kick of the caching kthread, and then attempt to make an allocation. If we cannot, we wait on the block groups caching waitqueue, which the caching kthread will wake the waiting threads up everytime it finds 2 meg worth of space, and then again when its finished caching. This is how I tested the speedup from this mkfs the disk mount the disk fill the disk up with fs_mark unmount the disk mount the disk time touch /mnt/foo Without my changes this took 11 seconds on my box, with these changes it now takes 1 second. Another change thats been put in place is we lock the super mirror's in the pinned extent map in order to keep us from adding that stuff as free space when caching the block group. This doesn't really change anything else as far as the pinned extent map is concerned, since for actual pinned extents we use EXTENT_DIRTY, but it does mean that when we unmount we have to go in and unlock those extents to keep from leaking memory. I've also added a check where when we are reading block groups from disk, if the amount of space used == the size of the block group, we go ahead and mark the block group as cached. This drastically reduces the amount of time it takes to cache the block groups. Using the same test as above, except doing a dd to a file and then unmounting, it used to take 33 seconds to umount, now it takes 3 seconds. This version uses the commit_root in the caching kthread, and then keeps track of how many async caching threads are running at any given time so if one of the async threads is still running as we cross transactions we can wait until its finished before handling the pinned extents. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24Btrfs: use hybrid extents+bitmap rb tree for free spaceJosef Bacik
Currently btrfs has a problem where it can use a ridiculous amount of RAM simply tracking free space. As free space gets fragmented, we end up with thousands of entries on an rb-tree per block group, which usually spans 1 gig of area. Since we currently don't ever flush free space cache back to disk this gets to be a bit unweildly on large fs's with lots of fragmentation. This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free space cache. Initially we calculate a threshold of extent entries we can handle, which is however many extent entries we can cram into 16k of ram. The maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace will be 32k of RAM, which scales much better than we did before. Once we pass the extent threshold, we start adding bitmaps and using those instead for tracking the free space. This patch also makes it so that any free space thats less than 4 * sectorsize we go ahead and put into a bitmap. This is nice since we try and allocate out of the front of a block group, so if the front of a block group is heavily fragmented and then has a huge chunk of free space at the end, we go ahead and add the fragmented areas to bitmaps and use a normal extent entry to track the big chunk at the back of the block group. I've also taken the opportunity to revamp how we search for free space. Previously we indexed free space via an offset indexed rb tree and a bytes indexed rb tree. I've dropped the bytes indexed rb tree and use only the offset indexed rb tree. This cuts the number of tree operations we were doing previously down by half, and gives us a little bit of a better allocation pattern since we will always start from a specific offset and search forward from there, instead of searching for the size we need and try and get it as close as possible to the offset we want. I've given this a healthy amount of testing pre-new format stuff, as well as post-new format stuff. I've booted up my fedora box which is installed on btrfs with this patch and ran with it for a few days without issues. I've not seen any performance regressions in any of my tests. Since the last patch Yan Zheng fixed a problem where we could have overlapping entries, so updating their offset inline would cause problems. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-24UBIFS: remove unneeded call from ubifs_sync_fsArtem Bityutskiy
Nowadays VFS always synchronizes all dirty inodes and pages before calling '->sync_fs()', so remove unneeded 'generic_sync_sb_inodes()' from 'ubifs_sync_fs()'. It used to be needed, but not any longer. Pointed-out-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-24UBIFS: kill BKLArtem Bityutskiy
The BKL was pushed down from VFS to the file-systems. It used to serialize mount/unmount/remount and prevented more than one instance of the same file-system from doing mount/umount/remount at the same time. But it is OK for UBIFS and it does not need any additional locking for these cases. Thus, kick the BKL out of UBIFS. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-24UBIFS: remove unused functionsSubrata Modak
Remove 'xent_key_init_hash()' and 'data_key_init_flash()' functions, as they are unot used anywhere. Signed-off-by: Subrata Modak <subrata@linux.vnet.ibm.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-24UBIFS: suppress compilation warningSubrata Modak
Fix "using uninitialized variable" compilation warning by using the "unititialized_var()" helper. Signed-off-by: Subrata Modak<subrata@linux.vnet.ibm.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-23ocfs2: Define credit counts for quota operationsJan Kara
Numbers of needed credits for some quota operations were written as raw numbers. Create appropriate defines instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Remove syncjiff field from quota infoJan Kara
syncjiff is just a converted value of syncms. Some places which are updating syncms forgot to update syncjiff as well. Since the conversion is just a simple division / multiplication and it does not happen frequently, just remove the syncjiff field to avoid forgotten conversions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Fix initialization of blockcheck statsJan Kara
We just set blockcheck stats to zeros but we should also properly initialize the spinlock there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Zero out padding of on disk dquot structureJan Kara
Padding fields of on-disk dquot structure were not zeroed. Zero them so that it's easier to use them later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Initialize blocks allocated to local quota fileJan Kara
When we extend local quota file, we should initialize data in newly allocated block. Firstly because on recovery we could parse bogus data, secondly so that block checksums are properly computed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq()Jan Kara
In a code path extending local quota files we marked new header buffer uptodate only after calling ocfs2_journal_access_dq() which triggers a bug. Fix it and also call ocfs2 variant of the function marking buffer uptodate. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Make global quota files blocksize alignedJan Kara
Change i_size of global quota files so that it always remains aligned to block size. This is mainly because the end of quota block may contain checksum (if checksumming is enabled) and it's a bit awkward for it to be "outside" of quota file (and it makes life harder for ocfs2-tools). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records.Tao Ma
In ocfs2_adjust_adjacent_records, we will adjust adjacent records according to the extent_list in the lower level. But actually the lower level tree will either be a leaf or a branch. If we only use ocfs2_is_empty_extent we will meet with some problem if the lower tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead. And actually only the leaf record can have holes. So add a BUG_ON for non-leaf branch. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23jfs: Fix early release of acl in jfs_get_aclStefan Bader
BugLink: http://bugs.launchpad.net/ubuntu/+bug/396780 Commit 073aaa1b142461d91f83da66db1184d7c1b1edea "helpers for acl caching + switch to those" introduced new helper functions for acl handling but seems to have introduced a regression for jfs as the acl is released before returning it to the caller, instead of leaving this for the caller to do. This causes the acl object to be used after freeing it, leading to kernel panics in completely different places. Thanks to Christophe Dumez for reporting and bisecting into this. Reported-by: Christophe Dumez <dchris@gmail.com> Tested-by: Christophe Dumez <dchris@gmail.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2009-07-22Merge branch 'lockdep-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: Fix lockdep annotation for pipe_double_lock()
2009-07-22[CIFS] fix sparse warningSteve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-22cifs: fix sb->s_maxbytes so that it casts properly to a signed valueJeff Layton
This off-by-one bug causes sendfile() to not work properly. When a task calls sendfile() on a file on a CIFS filesystem, the syscall returns -1 and sets errno to EOVERFLOW. do_sendfile uses s_maxbytes to verify the returned offset of the file. The problem there is that this value is cast to a signed value (loff_t). When this is done on the s_maxbytes value that cifs uses, it becomes negative and the comparisons against it fail. Even though s_maxbytes is an unsigned value, it seems that it's not OK to set it in such a way that it'll end up negative when it's cast to a signed value. These casts happen in other codepaths besides sendfile too, but the VFS is a little hard to follow in this area and I can't be sure if there are other bugs that this will fix. It's not clear to me why s_maxbytes isn't just declared as loff_t in the first place, but either way we still need to fix these values to make sendfile work properly. This is also an opportunity to replace the magic bit-shift values here with the standard #defines for this. This fixes the reproducer program I have that does a sendfile and will probably also fix the situation where apache is serving from a CIFS share. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-22cifs: disable serverino if server doesn't support itJeff Layton
A recent regression when dealing with older servers. This bug was introduced when we made serverino the default... When the server can't provide inode numbers, disable it for the mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-22Btrfs: Fix crash on read failures at mountDavid Woodhouse
If the tree roots hit read errors during mount, btrfs is not properly erroring out. We need to check the uptodate bits after reading in the tree root node. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: remove of redundant btrfs_header_levelDaniel Cadete
This removes the continues call's of btrfs_header_level. One call of btrfs_header_level(c) its enough. Signed-off-by Daniel Cadete <danielncadete10@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: adjust NULL testJulia Lawall
Move the call to BUG_ON to before the dereference of the tested value. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: Remove broken sanity check from btrfs_rmap_block()David Woodhouse
It was never actually doing anything anyway (see the loop condition), and it would be difficult to make it work for RAID[56]. Even if it was actually working, it's checking for the wrong thing anyway. Instead of checking whether we list a block which _doesn't_ land at the relevant physical location, it should be checking that we _have_ listed all the logical blocks which refer to the required physical location on all devices. This function is only called from remove_sb_from_cache() to ensure that we reserve the logical blocks which would reside at the same physical location as the superblock copies. So listing more blocks than we need is actually OK. With RAID[56] we're going to throw away an entire stripe for each block we have to ignore, so we _are_ going to list blocks other than the ones which actually contain the superblock. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: convert nested spin_lock_irqsave to spin_lockJulia Lawall
If spin_lock_irqsave is called twice in a row with the same second argument, the interrupt state at the point of the second call overwrites the value saved by the first call. Indeed, the second call does not need to save the interrupt state, so it is changed to a simple spin_lock. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22lockdep: Fix lockdep annotation for pipe_double_lock()Peter Zijlstra
The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>
2009-07-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: fs/Kconfig: move nilfs2 out
2009-07-22Btrfs: make sure all dirty blocks are written at commit timeYan Zheng
Write dirty block groups may allocate new block, and so may add new delayed back ref. btrfs_run_delayed_refs may make some block groups dirty. commit_cowonly_roots does not handle the recursion properly, and some dirty blocks can be left unwritten at commit time. This patch moves btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes the code not break out of the loop until there are no dirty block groups or delayed back refs. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: fix locking issue in btrfs_find_next_keyYan Zheng
When walking up the tree, btrfs_find_next_key assumes the upper level tree block is properly locked. This isn't always true even path->keep_locks is 1. This is because btrfs_find_next_key may advance path->slots[] several times instead of only once. When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found, we can't guarantee the original value of 'path->slots[level]' is 'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at 'level + 1' isn't locked. This patch fixes the issue by explicitly checking the locking state, re-searching the tree if it's not locked. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: fix double increment of path->slots[0] in btrfs_next_leafYan Zheng
if 1 is returned by btrfs_search_slot, the path already points to the first item with 'key > searching key'. So increasing path->slots[0] by one is superfluous in that case. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: properly update space information after shrinking device.Yan Zheng
Change 'goto done' to 'break' for the case of all device extents have been freed, so that the code updates space information will be execute. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22Btrfs: fix definition of struct btrfs_extent_inline_refYan Zheng
use __le64 instead of u64 in on-disk structure definition. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-21NFSv4: Fix a problem whereby a buggy server can oops the kernelTrond Myklebust
We just had a case in which a buggy server occasionally returns the wrong attributes during an OPEN call. While the client does catch this sort of condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return -EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a fallback to an ordinary lookup instead of just returning the error. When the buggy server then returns a regular file for the fallback lookup, the VFS allows the open, and bad things start to happen, since the open file doesn't have any associated NFSv4 state. The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and secondly to ensure that we are always careful when dereferencing the nfs_open_context state pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21ocfs2: Fix deadlock on umountJan Kara
In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21ocfs2: Add extra credits and access the modified bh in update_edge_lengths.Tao Ma
In normal tree rotation left process, we will never touch the tree branch above subtree_index and ocfs2_extend_rotate_transaction doesn't reserve the credits for them either. But when we want to delete the rightmost extent block, we have to update the rightmost records for all the rightmost branch(See ocfs2_update_edge_lengths), so we have to allocate extra credits for them. What's more, we have to access them also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21NFSv4: Fix an NFSv4 mount regressionTrond Myklebust
Commit 008f55d0e019943323c20a03493a2ba5672a4cc8 (nfs41: recover lease in _nfs4_lookup_root) forces the state manager to always run on mount. This is a bug in the case of NFSv4.0, which doesn't require us to send a setclientid until we want to grab file state. In any case, this is completely the wrong place to be doing state management. Moving that code into nfs4_init_session... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21NFSv4: Fix an Oops in nfs4_free_lock_stateTrond Myklebust
The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to be due to the nfs4_lock_state->ls_state field being uninitialised. This happens if the call to nfs4_free_lock_state() is triggered at the end of nfs4_get_lock_state(). The fix is to move the initialisation of ls_state into the allocator. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21inotify: use GFP_NOFS under potential memory pressureEric Paris
inotify can have a watchs removed under filesystem reclaim. ================================= [ INFO: inconsistent lock state ] 2.6.31-rc2 #16 --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3 {IN-RECLAIM_FS-W} state was registered at: [<c10536ab>] __lock_acquire+0x2c9/0xac4 [<c1053f45>] lock_acquire+0x9f/0xc2 [<c1308872>] __mutex_lock_common+0x2d/0x323 [<c1308c00>] mutex_lock_nested+0x2e/0x36 [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2 [<c108bfb6>] shrink_slab+0xe2/0x13c [<c108c3e1>] kswapd+0x3d1/0x55d [<c10449b5>] kthread+0x66/0x6b [<c1003fdf>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff Two things are needed to fix this. First we need a method to tell fsnotify_create_event() to use GFP_NOFS and second we need to stop using one global IN_IGNORED event and allocate them one at a time. This solves current issues with multiple IN_IGNORED on a queue having tail drop problems and simplifies the allocations since we don't have to worry about two tasks opperating on the IGNORED event concurrently. Signed-off-by: Eric Paris <eparis@redhat.com>