summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/volumes.c
AgeCommit message (Collapse)Author
2013-07-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "These are the usual mixture of bugs, cleanups and performance fixes. Miao has some really nice tuning of our crc code as well as our transaction commits. Josef is peeling off more and more problems related to early enospc, and has a number of important bug fixes in here too" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits) Btrfs: wait ordered range before doing direct io Btrfs: only do the tree_mod_log_free_eb if this is our last ref Btrfs: hold the tree mod lock in __tree_mod_log_rewind Btrfs: make backref walking code handle skinny metadata Btrfs: fix crash regarding to ulist_add_merge Btrfs: fix several potential problems in copy_nocow_pages_for_inode Btrfs: cleanup the code of copy_nocow_pages_for_inode() Btrfs: fix oops when recovering the file data by scrub function Btrfs: make the chunk allocator completely tree lockless Btrfs: cleanup orphaned root orphan item Btrfs: fix wrong mirror number tuning Btrfs: cleanup redundant code in btrfs_submit_direct() Btrfs: remove btrfs_sector_sum structure Btrfs: check if we can nocow if we don't have data space Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc Btrfs: use a percpu to keep track of possibly pinned bytes Btrfs: check for actual acls rather than just xattrs when caching no acl Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate Btrfs: optimize reada_for_balance Btrfs: optimize read_block_for_search ...
2013-07-02Btrfs: make the chunk allocator completely tree locklessJosef Bacik
When adjusting the enospc rules for relocation I ran into a deadlock because we were relocating the only system chunk and that forced us to try and allocate a new system chunk while holding locks in the chunk tree, which caused us to deadlock. To fix this I've moved all of the dev extent addition and chunk addition out to the delayed chunk completion stuff. We still keep the in-memory stuff which makes sure everything is consistent. One change I had to make was to search the commit root of the device tree to find a free dev extent, and hold onto any chunk em's that we allocated in that transaction so we do not allocate the same dev extent twice. This has the side effect of fixing a bug with balance that has been there ever since balance existed. Basically you can free a block group and it's dev extent and then immediately allocate that dev extent for a new block group and write stuff to that dev extent, all within the same transaction. So if you happen to crash during a balance you could come back to a completely broken file system. This patch should keep these sort of things from happening in the future since we won't be able to allocate free'd dev extents until after the transaction commits. This has passed all of the xfstests and my super annoying stress test followed by a balance. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-07-02Btrfs: fix wrong mirror number tuningMiao Xie
Now reading the data from the target device of the replace operation is allowed, so the mirror number that is greater than the stripes number of a chunk is valid, we will tune it when we find there is no target device later. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: Cocci spatch "ptr_ret.spatch"Thomas Meyer
Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14btrfs: device delete to get errors from the kernelAnand Jain
when user runs command btrfs dev del the raid requisite error if any goes to the /var/log/messages, its not good idea to clutter messages with these user (knowledge) errors, further user don't have to review the system messages to know problem with the cli it should be dropped to the user as part of the cli return. to bring this feature created a set of the ERROR defined BTRFS_ERROR_DEV* error codes and created their error string. I expect this enum to be added with other error which we might want to communicate to the user land v3: moved the code with in the file no logical change v1->v2: introduce error codes for the device mgmt usage v1: adds a parameter in the ioctl arg struct to carry the error string Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: cleanup the similar code of the fs root readMiao Xie
There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so it is unnecessary to check the orphan item again. So cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Miao Xie has been very busy, fixing races and enospc problems and many other small but important pieces. Alexandre Oliva discovered some problems with how our error handling was interacting with the block layer and for now has disabled our partial handling of sub-page writes. The real sub-page work is in a series of patches from IBM that we still need to integrate and test. The code Alexandre has turned off was really incomplete. Josef has more error handling fixes and an important fix for the new skinny extent format. This also has my fix for the tracepoint crash from late in 3.9. It's the first stage in a larger clean up to get rid of btrfs_bio and make a proper bioset for all the items we need to tack into the bio. For now the bioset only holds our mirror_num and stripe_index, but for the next merge window I'll shuffle more in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: use a btrfs bioset instead of abusing bio internals Btrfs: make sure roots are assigned before freeing their nodes Btrfs: explicitly use global_block_rsv for quota_tree btrfs: do away with non-whole_page extent I/O Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix() Btrfs: pause the space balance when remounting to R/O Btrfs: fix unprotected root node of the subvolume's inode rb-tree Btrfs: fix accessing a freed tree root Btrfs: return errno if possible when we fail to allocate memory Btrfs: update the global reserve if it is empty Btrfs: don't steal the reserved space from the global reserve if their space type is different Btrfs: optimize the error handle of use_block_rsv() Btrfs: don't use global block reservation for inode cache truncation Btrfs: don't abort the current transaction if there is no enough space for inode cache Correct allowed raid levels on balance. Btrfs: fix possible memory leak in replace_path() Btrfs: fix possible memory leak in the find_parent_nodes() Btrfs: don't allow device replace on RAID5/RAID6 Btrfs: handle running extent ops with skinny metadata ...
2013-05-17Merge branch 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
2013-05-17Btrfs: use a btrfs bioset instead of abusing bio internalsChris Mason
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Tejun Heo <tj@kernel.org>
2013-05-17Correct allowed raid levels on balance.Andreas Philipp
Raid5 with 3 devices is well defined while the old logic allowed raid5 only with a minimum of 4 devices when converting the block group profile via btrfs balance. Creating a raid5 with just three devices using mkfs.btrfs worked always as expected. This is now fixed and the whole logic is rewritten. Signed-off-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "These are mostly fixes. The biggest exceptions are Josef's skinny extents and Jan Schmidt's code to rebuild our quota indexes if they get out of sync (or you enable quotas on an existing filesystem). The skinny extents are off by default because they are a new variation on the extent allocation tree format. btrfstune -x enables them, and the new format makes the extent allocation tree about 30% smaller. I rebased this a few days ago to rework Dave Sterba's crc checks on the super block, but almost all of these go back to rc6, since I though 3.9 was due any minute. The biggest missing fix is the tracepoint bug that was hit late in 3.9. I ran into problems with that in overnight testing and I'm still tracking it down. I'll definitely have that fixed for rc2." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits) Btrfs: allow superblock mismatch from older mkfs btrfs: enhance superblock checks btrfs: fix misleading variable name for flags btrfs: use unsigned long type for extent state bits Btrfs: improve the loop of scrub_stripe btrfs: read entire device info under lock btrfs: remove unused gfp mask parameter from release_extent_buffer callchain btrfs: handle errors returned from get_tree_block_key btrfs: make static code static & remove dead code Btrfs: deal with errors in write_dev_supers Btrfs: remove almost all of the BUG()'s from tree-log.c Btrfs: deal with free space cache errors while replaying log Btrfs: automatic rescan after "quota enable" command Btrfs: rescan for qgroups Btrfs: split btrfs_qgroup_account_ref into four functions Btrfs: allocate new chunks if the space is not enough for global rsv Btrfs: separate sequence numbers for delayed ref tracking and tree mod log btrfs: move leak debug code to functions Btrfs: return free space in cow error path Btrfs: set UUID in root_item for created trees ...
2013-05-06btrfs: make static code static & remove dead codeEric Sandeen
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: don't BUG_ON() in btrfs_num_copiesJosef Bacik
A user sent me a btrfs-image that was panicing because of some corruption. This is because we pass in a bogus value to btrfs_num_copies, and it panics. Instead just return 1. We only call btrfs_num_copies to see if there are other copies to try and read for things, so if we just return 1 it will make the callers exit out with an appropriate error value. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: deal with bad mappings in btrfs_map_blockJosef Bacik
Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find a chunk for a particular block we were trying to map. This happened because the block was bogus. We shouldn't be BUG_ON()'ing in this case, just print a message and return an error. This came from reada_add_block and it appears to deal with an error fine so we should be good there. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: use a lock to protect incompat/compat flag of the super blockMiao Xie
The following case will make the incompat/compat flag of the super block be recovered. Task1 |Task2 flags = btrfs_super_incompat_flags(); | |flags = btrfs_super_incompat_flags(); flags |= new_flag1; | |flags |= new_flag2; btrfs_set_super_incompat_flags(flags); | |btrfs_set_super_incompat_flags(flags); the new_flag1 is recovered. In order to avoid this problem, we introduce a lock named super_lock into the btrfs_fs_info structure. If we want to update incompat/compat flags of the super block, we must hold it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06btrfs: ignore device open failures in __btrfs_open_devicesEric Sandeen
This: # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test would lead to a blkdev open/close mismatch when the mount fails, and a permanently busy (opened O_EXCL) sdb2: # wipefs -a /dev/sdb2 wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy It's because btrfs_open_devices() may open some devices, fail on the last one, and return that failure stored in "ret." The mount then fails, but the caller then does not clean up the open devices. Chris assures me that: "btrfs_open_devices just means: go off and open every bdev you can from this uuid. It should return success if we opened any of them at all." So change the logic to ignore any open failures; just skip processing of that device. Later on it's decided whether we have enough devices to continue. Reported-by: Jan Safranek <jsafrane@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: fix bad extent loggingJosef Bacik
A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: Include the device in most error printk()sSimon Kirby
With more than one btrfs volume mounted, it can be very difficult to find out which volume is hitting an error. btrfs_error() will print this, but it is currently rigged as more of a fatal error handler, while many of the printk()s are currently for debugging and yet-unhandled cases. This patch just changes the functions where the device information is already available. Some cases remain where the root or fs_info is not passed to the function emitting the error. This may introduce some confusion with volumes backed by multiple devices emitting errors referring to the primary device in the set instead of the one on which the error occurred. Use btrfs_printk(fs_info, format, ...) rather than writing the device string every time, and introduce macro wrappers ala XFS for brevity. Since the function already cannot be used for continuations, print a newline as part of the btrfs_printk() message rather than at each caller. Signed-off-by: Simon Kirby <sim@hostway.ca> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-04-02Merge branch 'writeback-workqueue' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core Tejun writes: ----- This is the pull request for the earlier patchset[1] with the same name. It's only three patches (the first one was committed to workqueue tree) but the merge strategy is a bit involved due to the dependencies. * Because the conversion needs features from wq/for-3.10, block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those workqueue conflicts from flaring up in block tree. * Resolving the issue that Jan and Dave raised about debugging requires arch-wide changes. The patchset is being worked on[2] but it'll have to go through -mm after these changes show up in -next, and not included in this pull request. The three commits are located in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue Pulling it into block/for-3.10/core produces a conflict in drivers/md/raid5.c between the following two commits. e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available") 2f6db2a707 ("raid5: use bio_reset()") The conflict is trivial - one removes an "if ()" conditional while the other removes "rbi->bi_next = NULL" right above it. We just need to remove both. The merged branch is available at git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge so that you can use it for verification. The test merge commit has proper merge description. While these changes are a bit of pain to route, they make code simpler and even have, while minute, measureable performance gain[3] even on a workload which isn't particularly favorable to showing the benefits of this conversion. ---- Fixed up the conflict. Conflicts: drivers/md/raid5.c Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-23block: Use bio_sectors() more consistentlyKent Overstreet
Bunch of places in the code weren't using it where they could be - this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: "Ed L. Cashin" <ecashin@coraid.com> CC: Nick Piggin <npiggin@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz> CC: Jim Paris <jim@jtan.com> CC: Geoff Levand <geoff@infradead.org> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ed Cashin <ecashin@coraid.com>
2013-03-21Btrfs: handle a bogus chunk tree nicelyJosef Bacik
If you restore a btrfs-image file system and try to mount that file system we'll panic. That's because btrfs-image restores and just makes one big chunk to envelope the whole disk, since they are really only meant to be messed with by our btrfs-progs. So fix up btrfs_rmap_block and the callers of it for mount so that we no longer panic but instead just return an error and fail to mount. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14btrfs: use rcu_barrier() to wait for bdev puts at unmountEric Sandeen
Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-06Btrfs: fix a mismerge in btrfs_balance()Ilya Dryomov
Raid56 merge (merge commit e942f88) had mistakenly removed a call to __cancel_balance(), which resulted in balance not cleaning up after itself after a successful finish. (Cleanup includes switching the state, removing the balance item and releasing mut_ex_op testnset lock.) Bring it back. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-04Btrfs: do not BUG_ON on aborted situationLiu Bo
Btrfs balance can easily hit BUG_ON in these places, but we want to it bail out gracefully after we force the whole filesystem to readonly. So we use btrfs_std_error hook in place of BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28btrfs: remove a printk from scan_one_deviceDavid Sterba
Dave pointed out that he saw messages from btrfs although there was no such filesystem on his computers. The automatic device scan is called on every new blockdevice if the usual distro udev rule set is used. The printk introduced in 6f60cbd3ae442c was a remainder from copying portions of code from btrfs_get_bdev_and_sb which is used under different conditions and the warning makes sense there. Reported-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26btrfs: cleanup for open-coded alignmentQu Wenruo
Though most of the btrfs codes are using ALIGN macro for page alignment, there are still some codes using open-coded alignment like the following: ------ u64 mask = ((u64)root->stripesize - 1); u64 ret = (val + mask) & ~mask; ------ Or even hidden one: ------ num_bytes = (end - start + blocksize) & ~(blocksize - 1); ------ Sometimes these open-coded alignment is not so easy to understand for newbie like me. This commit changes the open-coded alignment to the ALIGN macro for a better readability. Also there is a previous patch from David Sterba with similar changes, but the patch is for 3.2 kernel and seems not merged. http://www.spinics.net/lists/linux-btrfs/msg12747.html Cc: David Sterba <dave@jikos.cz> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix max chunk size on raid5/6Chris Mason
We try to limit the size of a chunk to 10GB, which keeps the unit of work reasonable during balance and resize operations. The limit checks were taking into account the number of copies of the data we had but what they really should be doing is comparing against the logical size of the chunk we're creating. This moves the code around a little to use the count of data stripes from raid5/6. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20btrfs: Init io_lock after cloning btrfs device structThomas Gleixner
__btrfs_close_devices() clones btrfs device structs with memcpy(). Some of the fields in the clone are reinitialized, but it's missing to init io_lock. In mainline this goes unnoticed, but on RT it leaves the plist pointing to the original about to be freed lock struct. Initialize io_lock after cloning, so no references to the original struct are left. Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20Merge branch 'raid56-experimental' into for-linus-3.9Chris Mason
Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/ctree.h fs/btrfs/extent-tree.c fs/btrfs/inode.c fs/btrfs/volumes.c
2013-02-20btrfs: define BTRFS_MAGIC as a u64 valueZach Brown
super.magic is an le64 but it's treated as an unterminated string when compared against BTRFS_MAGIC which is defined as a string. Instead define BTRFS_MAGIC as a normal hex value and use endian helpers to compare it to the super's magic. I tested this by mounting an fs made before the change and made sure that it didn't introduce sparse errors. This matches a similar cleanup that is pending in btrfs-progs. David Sterba pointed out that we should fix the kernel side as well :). Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: allow for selecting only completely empty chunksIlya Dryomov
Enhance balance usage filter by making it possible to balance out only completely empty chunks. Today, usage filter properly acts on values from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0 selects no chunks. This commit changes the usage=0 case: the new meaning is to restripe only completely empty chunks and nothing else. Suggested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: eliminate a use-after-free in btrfs_balance()Ilya Dryomov
Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed above in __cancel_balance() in all cases except for balance pause. Fix this by moving the offending check a couple statements above, the meaning of the check is preserved. Reported-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: ensure we don't overrun devices_info[] in __btrfs_alloc_chunkEric Sandeen
WARN_ON isn't enough, we need to stop the loop if for any reason we would overrun the devices_info array. I tried to track down the connection between the length of the alloc_devices list and the rw_devices counter but it wasn't immediately obvious, so be defensive about it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: remove extent mapping if we fail to add chunkJosef Bacik
I got a double free error when unmounting a file system that failed to add a chunk during its operation. This is because we will kfree the mapping that we created but leave the extent_map in the em_tree for chunks. So to fix this just remove the extent_map when we error out so we don't run into this problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix chunk allocation error handlingJosef Bacik
If we error out allocating a dev extent we will have already created the block group and such which will cause problems since the allocator may have tried to allocate out of the block group that no longer exists. This will cause BUG_ON()'s in the bio submission path. This also makes a failure to allocate a dev extent a non-abort error, we will just clean up the dev extents we did allocate and exit. Now if we fail to delete the dev extents we will abort since we can't have half of the dev extents hanging around, but this will make us much less likely to abort. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bitsMiao Xie
There is no lock to protect fs_info->avail_{data, metadata, system}_alloc_bits, it may introduce some problem, such as the wrong profile information, so we add a seqlock to protect them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: make raid attr array more readableMiao Xie
The current code of raid attr arry is hard to understand and it is easy to introduce some problem if we modify the array. So I changed it and made it more readable. Cc: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-15btrfs: access superblock via pagecache in scan_one_deviceDavid Sterba
btrfs_scan_one_device is calling set_blocksize() which can race with a concurrent process making dirty page cache pages. It can end up dropping dirty page cache pages on the floor, which isn't very nice when someone is just running btrfs dev scan to find filesystems on the box. Now that udev is registering btrfs devices as it discovers them, we can actually end up racing with our own mkfs program too. When this happens, we drop some of the important blocks written by mkfs. This commit changes scan_one_device to read the super out of the page cache instead of trying to use bread. This way we don't have to care about the blocksize of the device. This also drops the invalidate_bdev() call. It wasn't very polite to invalidate during the scan either. mkfs is putting the super into the page cache, there's no reason to invalidate at this point. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git ↵Chris Mason
for-chris into for-linus
2013-02-05Merge branch 'for-linus' into raid56-experimentalChris Mason
Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-05Btrfs: remove conflicting check for minimum number of devices in raid56Chris Mason
The device removal code was incorrectly checking against two different limits for raid5 and raid6. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-01Btrfs: RAID5 and RAID6David Woodhouse
This builds on David Woodhouse's original Btrfs raid5/6 implementation. The code has changed quite a bit, blame Chris Mason for any bugs. Read/modify/write is done after the higher levels of the filesystem have prepared a given bio. This means the higher layers are not responsible for building full stripes, and they don't need to query for the topology of the extents that may get allocated during delayed allocation runs. It also means different files can easily share the same stripe. But, it does expose us to incorrect parity if we crash or lose power while doing a read/modify/write cycle. This will be addressed in a later commit. Scrub is unable to repair crc errors on raid5/6 chunks. Discard does not work on raid5/6 (yet) The stripe size is fixed at 64KiB per disk. This will be tunable in a later commit. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-01btrfs: don't try to notify udev about missing devicesEric Sandeen
If we remove a missing device, bdev is null, and if we send that off to btrfs_kobject_uevent we'll panic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-01-24Btrfs: fix wrong max device number for single profileMiao Xie
The max device number of single profile is 1, not 0 (0 means 'as many as possible'). Fix it. Cc: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-21Btrfs: fix a regression in balance usage filterIlya Dryomov
Commit 3fed40cc ("Btrfs: cleanup duplicated division functions"), which was merged into 3.8-rc1, has introduced a regression by removing logic that was guarding us against bad user input. Bring it back. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-01-20Btrfs: bring back balance pause/resume logicIlya Dryomov
Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1 as part of dev-replace merge). Offending commit took a stab at making mutually exclusive volume operations (add_dev, rm_dev, resize, balance, replace_dev) not block behind volume_mutex if another such operation is in progress and instead return an error right away. Balancing front-end relied on the blocking behaviour, so the fix is ugly, but short of a complete rework, it's the best we can do. Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-14btrfs: get the device in write mode when deleting itLukas Czerner
When we're deleting the device we should get it in write mode since we're going to re-write the super block magic on that device. And it should fail if the device is read-only. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2012-12-16Btrfs: put raid properties into global tableLiu Bo
Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16Btrfs: log changed inodes based on the extent map treeJosef Bacik
We don't really need to copy extents from the source tree since we have all of the information already available to us in the extent_map tree. So instead just write the extents straight to the log tree and don't bother to copy the extent items from the source tree. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16btrfs: Notify udev when removing deviceLukas Czerner
Currently udev does not know about the device being removed from the file system. This may result in the situation where we're unable to mount the file system by UUID or by LABEL because the by-uuid and by-label links may still point to the device which is no longer part of the btrfs file system and hence does not have any btrfs super block. It can be easily reproduced by the following: mkfs.btrfs -L bugfs /dev/loop[0-6] mount /dev/loop0 /mnt/test btrfs device delete /dev/loop0 /mnt/test umount /mnt/test mount LABEL=bugfs /mnt/test <---- this fails then see: ls -l /dev/disk/by-label/bugfs which will still point to the /dev/loop0 We did not noticed this before because libblkid would send the udev event for us when it notice that the link does not fit the reality, however it does not do that anymore and completely relies on udev information. Fix this by sending the KOBJ_CHANGE event to the bdev kobject after successful device removal. Note that this does not affect device addition, because we will open the device prior the addition from userspace and udev will notice that and reread the device afterwards. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>