summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2012-05-26Btrfs: dummy extent buffers for tree mod logJan Schmidt
The tree modification log needs two ways to create dummy extent buffers, once by allocating a fresh one (to rebuild an old root) and once by cloning an existing one (to make private rewind modifications) to it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: move struct seq_list to ctree.hJan Schmidt
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: don't set for_cow parameter for tree block functionsJan Schmidt
Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed parameter for_cow = 1. In fact, these two functions should never mark their tree modification operations as for_cow, because they can change the number of blocks referenced by a tree. Hence, we remove the extra for_cow parameter from these functions and make them pass a zero down. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: look into the extent during find_all_leafsJan Schmidt
Before this patch we called find_all_leafs for a data extent, then called find_all_roots and then looked into the extent to grab the information we were seeking. This was done without holding the leaves locked to avoid deadlocks. However, this can obviouly race with concurrent tree modifications. Instead, we now look into the extent while we're holding the lock during find_all_leafs and store this information together with the leaf list. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: bugfix: ignore the wrong key for indirect tree block backrefsJan Schmidt
The key we store with a tree block backref is only a hint. It is set when the ref is created and can remain correct for a long time. As the tree is rebalanced, however, eventually the key no longer points to the correct destination. With this patch, we change find_parent_nodes to no longer add keys unless it knows for sure they're correct (e.g. because they're for an extent data backref). Then when we later encounter a backref ref with no parent and no key set, we grab the block and take the first key from the block itself. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: bugfix in btrfs_find_parent_nodesJan Schmidt
That one has been around since the addition of backref.c. Due to the way we calculate our slot numbers, after adding inline refs we're missing one keyed ref unless it's located at the beginning of a new leaf. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26Btrfs: ulist realloc bugfixJan Schmidt
ulist_next gets the pointer to the previously returned element to find the next element from there. However, when we call ulist_add while iteration with ulist_next is in progress (ulist explicitly supports this), we can realloc the ulist internal memory, which makes the pointer to the previous element useless. Instead, we now use an iterator parameter that's independent from the internal pointers. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-06Btrfs: avoid sleeping in verify_parent_transid while atomicChris Mason
verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix crash in scrub repair code when device is missingStefan Behrens
Fix that when scrub tries to repair an I/O or checksum error and one of the devices containing the mirror is missing, it crashes in bio_add_page because the bdev is a NULL pointer for missing devices. Reported-by: Marco L. Crociani <marco.crociani@gmail.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04btrfs: Fix mismatching struct members in ioctl.hAlexander Block
Fix the size members of btrfs_ioctl_ino_path_args and btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used __u64 and the kernel headers used __u32 before. Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix page leak when allocing extent buffersJosef Bacik
If we happen to alloc a extent buffer and then alloc a page and notice that page is already attached to an extent buffer, we will only unlock it and free our existing eb. Any pages currently attached to that eb will be properly freed, but we don't do the page_cache_release() on the page where we noticed the other extent buffer which can cause us to leak pages and I hope cause the weird issues we've been seeing in this area. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: Add properly locking around add_root_to_dirty_listChris Mason
add_root_to_dirty_list happens once at the very beginning of the transaction, but it is still racey. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: reduce lock contention during extent insertionChris Mason
We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdirChris Mason
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Fix space checking during fs resizeDaniel J Blueman
Fix out-of-space checking, addressing a warning and potential resource leak when resizing the filesystem down while allocating blocks. Signed-off-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix block_rsv and space_info lock orderingStefan Behrens
may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Prevent root_list corruptionDaniel J Blueman
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add correct locking to address this. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix repair code for RAID10Jan Schmidt
btrfs_map_block sets mirror_num, so that the repair code knows eventually which device gave us the read error. For RAID10, mirror_num must be 1 or 2. Before this fix mirror_num was incorrectly related to our stripe index. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: do not start delalloc inodes during syncJosef Bacik
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-18Btrfs: fix that check_int_data mount option was ignoredStefan Behrens
The bitfield member mount_opt was too small by one bit to hold the mount option that enabled to include data extents in the integrity checker. Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR option was added (git rebase silently merges so that the increase of the size of the bitfield member is lost), the bit limit was removed entirely. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18Btrfs: don't count CRC or header errors twice while scrubbingStefan Behrens
Each CRC or header error was counted twice, this is now fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18Btrfs: fix btrfs_ioctl_dev_info() crash on missing deviceStefan Behrens
When a filesystem is mounted with the degraded option, it is possible that some of the devices are not there. btrfs_ioctl_dev_info() crashs in this case because the device name is a NULL pointer. This ioctl was only used for scrub. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18btrfs: don't return EINTRArne Jansen
It is basically a good thing if we are interruptible when waiting for free space, but the generality in which it is implemented currently leads to system calls being interruptible that are not documented this way. For example git can't handle interrupted unlink(), leading to corrupt repos under space pressure. Instead we raise the bar to only be interruptible by SIGKILL. Thanks to David Sterba for suggesting this. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18Btrfs: double unlock bug in error handlingDan Carpenter
The caller expects this function to return with the lock held and releases it immediately on error. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-04-18Btrfs: always store the mirror we read the eb fromJosef Bacik
A user reported a panic where we were trying to fix a bad mirror but the mirror number we were giving was 0, which is invalid. This is because we don't do the transid verification until after the read, so as far as the read code is concerned the read was a success. So instead store the mirror we read from so that if there is some failure post read we know which mirror to try next and which mirror needs to be fixed if we find a good copy of the block. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-04-18fs/btrfs/volumes.c: add missing free_fs_devicesJulia Lawall
Free fs_devices as done in the error-handling code just below. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2012-04-18btrfs: fix early abort in 'remount'Sergei Trofimovich
Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Josef Bacik <josef@redhat.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-04-18Btrfs: fix max chunk size check in chunk allocatorIlya Dryomov
Fix a bug, where in case we need to adjust stripe_size so that the length of the resulting chunk is less than or equal to max_chunk_size, DUP chunks turn out to be only half as big as they could be. Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-04-18Btrfs: add missing read locks in backref.cJan Schmidt
iref_to_path and iterate_irefs both increment the eb's refcount to use it after releasing the path. Both depend on consistent data remaining in the extent buffer and need a read lock to protect it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18Btrfs: don't call free_extent_buffer twice in iterate_irefsJan Schmidt
Avoid calling free_extent_buffer more than once when the iterator function returns non-zero. The only code that uses this is scrub repair for corrupted nodatasum blocks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18Btrfs: Make free_ipath() deal gracefully with NULL pointersJesper Juhl
Make free_ipath() behave like most other freeing functions in the kernel and gracefully do nothing when passed a NULL pointer. Besides this making the bahaviour consistent with functions such as kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that function we have this code: ... ipath = init_ipath(size, root, path); if (IS_ERR(ipath)) { ret = PTR_ERR(ipath); ipath = NULL; goto out; } ... out: btrfs_free_path(path); free_ipath(ipath); ... If we ever take the true branch of that 'if' statement we'll end up passing a NULL pointer to free_ipath() which will subsequently dereference it and we'll go "Boom" :-( This patch will avoid that. Signed-off-by: Jesper Juhl <jj@chaosbits.net>
2012-04-18Btrfs: avoid possible use-after-free in clear_extent_bit()Li Zefan
clear_extent_bit() { next_node = rb_next(&state->rb_node); ... clear_state_bit(state); <-- this may free next_node if (next_node) { state = rb_entry(next_node); ... } } clear_state_bit() calls merge_state() which may free the next node of the passing extent_state, so clear_extent_bit() may end up referencing freed memory. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18Btrfs: retrurn void from clear_state_bitLi Zefan
Currently it returns a set of bits that were cleared, but this return value is not used at all. Moreover it doesn't seem to be useful, because we may clear the bits of a few extent_states, but only the cleared bits of last one is returned. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18btrfs: add missing unlocks to transaction abort pathsDavid Sterba
Added in commit 49b25e0540904be0bf558b84475c69d72e4de66e ("btrfs: enhance transaction abort infrastructure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2012-04-18Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZELiu Bo
Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE. It will lead to hanging-on while writing something. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-04-18btrfs: don't add both copies of DUP to reada extent treeArne Jansen
Normally when there are 2 copies of a block, we add both to the reada extent tree and prefetch only the one that is easier to reach. This way we can better utilize multiple devices. In case of DUP this makes no sense as both copies reside on the same device. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18btrfs: fix race in readaArne Jansen
When inserting into the radix tree returns EEXIST, get the existing entry without giving up the spinlock in between. There was a race for both the zones trees and the extent tree. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18Btrfs: avoid setting ->d_op twiceLi Zefan
Follow those instructions, and you'll trigger a warning in the beginning of d_set_d_op(): # mkfs.btrfs /dev/loop3 # mount /dev/loop3 /mnt # btrfs sub create /mnt/sub # btrfs sub snap /mnt /mnt/snap # touch /mnt/snap/sub touch: cannot touch `tmp': Permission denied __d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and then simple_lookup() reset it to simple_dentry_operations, which triggered the warning. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-12Btrfs: use commit root when loading free space cacheJosef Bacik
A user reported that booting his box up with btrfs root on 3.4 was way slower than on 3.3 because I removed the ideal caching code. It turns out that we don't load the free space cache if we're in a commit for deadlock reasons, but since we're reading the cache and it hasn't changed yet we are safe reading the inode and free space item from the commit root, so do that and remove all of the deadlock checks so we don't unnecessarily skip loading the free space cache. The user reported this fixed the slowness. Thanks, Tested-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix use-after-free in __btrfs_end_transactionDave Jones
49b25e0540904be0bf558b84475c69d72e4de66e introduced a use-after-free bug that caused spurious -EIO's to be returned. Do the check before we free the transaction. Cc: David Sterba <dsterba@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: check return value of bio_alloc() properlyTsutomu Itoh
bio_alloc() has the possibility of returning NULL. So, it is necessary to check the return value. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: remove lock assert from get_restripe_target()Ilya Dryomov
This fixes a regression introduced by fc67c450. spin_is_locked() always returns 0 on UP kernels, which caused assert in get_restripe_target() to be fired on every call from btrfs_reduce_alloc_profile() on UP systems. Remove it completely for now, it's not clear if it's going to be needed in future. Reported-by: Bobby Powers <bobbypowers@gmail.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix eof while discarding extentsLiu Bo
We miscalculate the length of extents we're discarding, and it leads to an eof of device. Reported-by: Daniel Blueman <daniel@quora.org> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix uninit variable in repair_eb_io_failureChris Mason
We'd have to be passing bogus extent buffers for this uninit variable to actually be used, but set it to zero just in case. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Revert "Btrfs: increase the global block reserve estimates"Chris Mason
This reverts commit 5500cdbe14d7435e04f66ff3cfb8ecd8b8e44ebf. We've had a number of complaints of early enospc that bisect down to this patch. We'll hae to fix the reservations differently. CC: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: update the checks for mixed block groups with big metadata blocksChris Mason
Dave Sterba had put in patches to look for mixed data/metadata groups with metadata bigger than 4KB. But these ended up in the wrong place and it wasn't testing the feature flag correctly. This updates the tests to make sure our sizes are matching Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: update to the right index of defragmentLiu Bo
When we use autodefrag, we forget to update the index which indicates the last page we've dirty. And we'll set dirty flags on a same set of pages again and again. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: do not bother to defrag an extent if it is a big real extentLiu Bo
$ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3072 10 eof /mnt/btrfs/foobar: 1 extent found Now we have a big real extent [0, 40960), but autodefrag will still defrag it. $ sync $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3082 10 eof /mnt/btrfs/foobar: 1 extent found So if we already find a big real extent, we're ok about that, just skip it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: add a check to decide if we should defrag the rangeLiu Bo
If our file's layout is as follows: | hole | data1 | hole | data2 | we do not need to defrag this file, because this file has holes and cannot be merged into one extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: fix recursive defragment with autodefrag optionLiu Bo
$ mkfs.btrfs disk $ mount disk /mnt -o autodefrag $ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync $ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \ seek=$i conv=notrunc 2> /dev/null; done && sync then we'll get to defrag "foobar" again and again. So does option "-o autodefrag,compress". Reasons: When the cleaner kthread gets to fetch inodes from the defrag tree and defrag them, it will dirty pages and submit them, this will comes to another DATA COW where the processing inode will be inserted to the defrag tree again. This patch sets a rule for COW code, i.e. insert an inode when we're really going to make some defragments. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>