summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/root-tree.c
AgeCommit message (Collapse)Author
2009-06-10Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)Yan Zheng
This commit introduces a new kind of back reference for btrfs metadata. Once a filesystem has been mounted with this commit, IT WILL NO LONGER BE MOUNTABLE BY OLDER KERNELS. When a tree block in subvolume tree is cow'd, the reference counts of all extents it points to are increased by one. At transaction commit time, the old root of the subvolume is recorded in a "dead root" data structure, and the btree it points to is later walked, dropping reference counts and freeing any blocks where the reference count goes to 0. The increments done during cow and decrements done after commit cancel out, and the walk is a very expensive way to go about freeing the blocks that are no longer referenced by the new btree root. This commit reduces the transaction overhead by avoiding the need for dead root records. When a non-shared tree block is cow'd, we free the old block at once, and the new block inherits old block's references. When a tree block with reference count > 1 is cow'd, we increase the reference counts of all extents the new block points to by one, and decrease the old block's reference count by one. This dead tree avoidance code removes the need to modify the reference counts of lower level extents when a non-shared tree block is cow'd. But we still need to update back ref for all pointers in the block. This is because the location of the block is recorded in the back ref item. We can solve this by introducing a new type of back ref. The new back ref provides information about pointer's key, level and in which tree the pointer lives. This information allow us to find the pointer by searching the tree. The shortcoming of the new back ref is that it only works for pointers in tree blocks referenced by their owner trees. This is mostly a problem for snapshots, where resolving one of these fuzzy back references would be O(number_of_snapshots) and quite slow. The solution used here is to use the fuzzy back references in the common case where a given tree block is only referenced by one root, and use the full back references when multiple roots have a reference on a given block. This commit adds per subvolume red-black tree to keep trace of cached inodes. The red-black tree helps the balancing code to find cached inodes whose inode numbers within a given range. This commit improves the balancing code by introducing several data structures to keep the state of balancing. The most important one is the back ref cache. It caches how the upper level tree blocks are referenced. This greatly reduce the overhead of checking back ref. The improved balancing code scales significantly better with a large number of snapshots. This is a very large commit and was written in a number of pieces. But, they depend heavily on the disk format change and were squashed together to make sure git bisect didn't end up in a bad state wrt space balancing or the format change. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: Fix checkpatch.pl warningsChris Mason
There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-02Btrfs: make things static and include the right headersChristoph Hellwig
Shut up various sparse warnings about symbols that should be either static or have their declarations in scope. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-11-17Btrfs: prevent loops in the directory tree when creating snapshotsChris Mason
For a directory tree: /mnt/subvolA/subvolB btrfsctl -s /mnt/subvolA/subvolB /mnt Will create a directory loop with subvolA under subvolB. This commit uses the forward refs for each subvol and snapshot to error out before creating the loop. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-11-17Btrfs: Add backrefs and forward refs for subvols and snapshotsChris Mason
Subvols and snapshots can now be referenced from any point in the directory tree. We need to maintain back refs for them so we can find lost subvols. Forward refs are added so that we know all of the subvols and snapshots referenced anywhere in the directory tree of a single subvol. This can be used to do recursive snapshotting (but they aren't yet) and it is also used to detect and prevent directory loops when creating new snapshots. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-29Btrfs: add and improve commentsChris Mason
This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-26Btrfs: update space balancing codeZheng Yan
This patch updates the space balancing code to utilize the new backref format. Before, btrfs-vol -b would break any COW links on data blocks or metadata. This was slow and caused the amount of space used to explode if a large number of snapshots were present. The new code can keeps the sharing of all data extents and most of the tree blocks. To maintain the sharing of data extents, the space balance code uses a seperate inode hold data extent pointers, then updates the references to point to the new location. To maintain the sharing of tree blocks, the space balance code uses reloc trees to relocate tree blocks in reference counted roots. There is one reloc tree for each subvol, and all reloc trees share same root key objectid. Reloc trees are snapshots of the latest committed roots of subvols (root->commit_root). To relocate a tree block referenced by a subvol, there are two steps. COW the block through subvol's reloc tree, then update block pointer in the subvol to point to the new block. Since all reloc trees share same root key objectid, doing special handing for tree blocks owned by them is easy. Once a tree block has been COWed in one reloc tree, we can use the resulting new block directly when the same block is required to COW again through other reloc trees. In this way, relocated tree blocks are shared between reloc trees, so they are also shared between subvols. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add a write ahead tree log to optimize synchronous operationsChris Mason
File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Various small fixes.Yan Zheng
This trivial patch contains two locking fixes and a off by one fix. --- Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Fix deadlock while searching for dead roots on mountChris Mason
btrfs_find_dead_roots called btrfs_read_fs_root_no_radix, which means we end up calling btrfs_search_slot with a path already held. The fix is to remember the key inside btrfs_find_dead_roots and drop the path. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Properly find the root for snapshotted blocks during chunk relocationChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Change st_blocksize to 4kChris Mason
Some programs (python) do rwm cycles at the granularity returned by stat. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Support for online FS resize (grow and shrink)Chris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Create extent_buffer interface for large blocksizesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-09-11Btrfs: Find and remove dead roots the first time a root is loaded.Chris Mason
Dead roots are trees left over after a crash, and they were either in the process of being removed or were waiting to be removed when the box crashed. Before, a search of the entire tree of root pointers was done on mount looking for dead roots. Now, the search is done the first time we load a root. This makes mount faster when there are a large number of snapshots, and it enables the block accounting code to properly update the block counts on the latest root as old versions of the root are reaped after a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-07-11Btrfs: Some code cleanupsAneesh
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-07-11Btrfs: trivial include fixupsZach Brown
Almost none of the files including module.h need to do so, remove them. Include sched.h in extent-tree.c to silence a warning about cond_resched() being undeclared. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-22Btrfs: Documentation updateChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-22Btrfs: Add the ability to find and remove dead roots after a crash.Chris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-22Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stackChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-13btrfs: Code cleanupAneesh
Attaching below is some of the code cleanups that i came across while reading the code. a) alloc_path already calls init_path. b) Mention that btrfs_inode is the in memory copy.Ext4 have ext4_inode_info as the in memory copy ext4_inode as the disk copy Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12Btrfs: add GPLv2Chris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12Btrfs: printk fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12Btrfs: 64 bit div fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-19Btrfs: early fsync supportChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-10Btrfs: snapshot progressChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-02Btrfs: dynamic allocation of path structChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-30Btrfs: corruption hunt continuesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-23btrfs_create, btrfs_write_super, btrfs_sync_fsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-22Mountable btrfs, with readdirChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-21Btrfs: initial move to kernel module landChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-16Btrfs: transaction handles everywhereChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15Btrfs: Use a chunk of the key flags to record the item type.Chris Mason
Add (untested and simple) directory item code Fix comp_keys to use the new key ordering Add btrfs_insert_empty_item Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-14Btrfs: variable block size supportChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-13Btrfs: Change the super to point to a tree of trees to enable persistent ↵Chris Mason
snapshots Signed-off-by: Chris Mason <chris.mason@oracle.com>