summaryrefslogtreecommitdiffstats
path: root/fs/nilfs2/inode.c
AgeCommit message (Collapse)Author
2011-01-10nilfs2: unfold nilfs_dat_inode functionRyusuke Konishi
nilfs_dat_inode function was a wrapper to switch between normal dat inode and gcdat, a clone of the dat inode for garbage collection. This function got obsolete when the gcdat inode was removed, and now we can access the dat inode directly from a nilfs object. So, we will unfold the wrapper and remove it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10nilfs2: do not pass sbi to functions which can get it from inodeRyusuke Konishi
This removes argument for passing nilfs_sb_info structure from nilfs_set_file_dirty and nilfs_load_inode_block functions. We can get a pointer to the structure from inodes. [Stephen Rothwell <sfr@canb.auug.org.au>: fix conflict with commit b74c79e99389cd79b31fcc08f82c24e492e63c7e] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10nilfs2: fiemap supportRyusuke Konishi
This adds fiemap to nilfs. Two new functions, nilfs_fiemap and nilfs_find_uncommitted_extent are added. nilfs_fiemap() implements the fiemap inode operation, and nilfs_find_uncommitted_extent() helps to get a range of data blocks whose physical location has not been determined. nilfs_fiemap() collects extent information by looping through nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10nilfs2: mark buffer heads as delayed until the data is written to diskRyusuke Konishi
Nilfs does not allocate new blocks on disk until they are actually written to. To implement fiemap, we need to deal with such blocks. To allow successive fiemap patch to distinguish mapped but unallocated regions, this marks buffer heads of those new blocks as delayed and clears the flag after the blocks are written to disk. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10nilfs2: call nilfs_error inside bmap routinesRyusuke Konishi
Some functions using nilfs bmap routines can wrongly return invalid argument error (i.e. -EINVAL) that bmap returns as an internal code for btree corruption. This fixes the issue by catching and converting the internal EINVAL to EIO and calling nilfs_error function inside bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-07fs: provide rcu-walk aware permission i_opsNick Piggin
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-10-23nilfs2: see state of root dentry for mount check of snapshotsRyusuke Konishi
After applied the patch that unified sb instances, root dentry of snapshots can be left in dcache even after their trees are unmounted. The orphan root dentry/inode keeps a root object, and this causes false positive of nilfs_checkpoint_is_mounted function. This resolves the issue by having nilfs_checkpoint_is_mounted test whether the root dentry is busy or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: use iget for all metadata filesRyusuke Konishi
This makes use of iget5_locked to allocate or get inode for metadata files to stop using own inode allocator. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: allow nilfs_clear_inode to clear metadata file inodesRyusuke Konishi
Allows clear inode function (nilfs_clear_inode) to handle metadata files that uses bitmap-based object alloctor. DAT and ifile correspond to this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: deny write access to inodes in snapshotsRyusuke Konishi
Snapshots of nilfs are read-only. After super block instances (sb) will be unified, nilfs will need to check write access by a way other than implicit test with IS_RDONLY(inode). This is because IS_RDONLY() refers to MS_RDONLY bit of inode->i_sb->s_flags and it will become inaccurate after the unification of sb. To prepare for the issue, this uses i_op->permission to deny write access to inodes in snapshots. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: move inode count and block count into root objectRyusuke Konishi
This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: use root object to get ifileRyusuke Konishi
This rewrites functions using ifile so that they get ifile from nilfs_root object, and will remove sbi->s_ifile. Some functions that don't know the root object are extended to receive it from caller. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: set pointer to root object in inodesRyusuke Konishi
This puts a pointer to nilfs_root object in the private part of on-memory inode, and makes nilfs_iget function pick up the inode with the same root object. Non-root inodes inherit its nilfs_root object from parent inode. That of the root inode is allocated through nilfs_attach_checkpoint() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: remove own inode hash used for GCRyusuke Konishi
This uses inode hash function that vfs provides instead of the own hash table for caching gc inodes. This finally removes the own inode hash from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: use iget5_locked to get inodeRyusuke Konishi
This uses iget5_locked instead of iget_locked so that gc cache can look up inodes with an inode number and an optional checkpoint number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirtyRyusuke Konishi
This allows sop->dirty_inode callback function (nilfs_dirty_inode) to handle metadata file inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-08-09convert nilfs2 to ->evict_inode()Al Viro
[folded build fix from sfr] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09remove inode_setattrChristoph Hellwig
Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of block_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09sort out blockdev_direct_IO variantsChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21nilfs2: replace inode uid,gid,mode initialization with helper functionDmitry Monakhov
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-10nilfs2: use huge_encode_dev/huge_decode_devRyusuke Konishi
This replaces uses of new_encode_dev/new_decode_dev with their 64-bit counterparts, huge_encode_dev/huge_decode_dev respectively. This is just for clarification and has no impact on the disk format. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-11-27nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirtyJiro SEKIBA
Replace mark_inode_dirty() as nilfs_mark_inode_dirty() to reduce deep function calls. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27nilfs2: delete mark_inode_dirty in nilfs_new_inodeJiro SEKIBA
It is redundant to call mark_inode_dirty() in nilfs_new_inode() because all caller of nilfs_new_inode() will call mark_inode_dirty() after calling nilfs_new_inode() directly or indirectly in transaction. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: move out mark_inode_dirty calls from bmap routinesRyusuke Konishi
Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called mark_inode_dirty() after they changed the number of data blocks. This moves these calls outside bmap outermost functions like nilfs_bmap_insert() or nilfs_bmap_truncate(). This will mitigate overhead for truncate or delete operation since they repeatedly remove set of blocks. Nearly 10 percent improvement was observed for removal of a large file: # dd if=/dev/zero of=/test/aaa bs=1M count=512 # time rm /test/aaa real 2.968s -> 2.705s Further optimization may be possible by eliminating these mark_inode_dirty() uses though I avoid mixing separate changes here. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: remove buffer locking in nilfs_mark_inode_dirtyRyusuke Konishi
This lock is eliminable because inodes on the buffer can be updated independently. Although a log writer also fills in bmap data on the on-disk inodes, this update is exclusively done by a log writer lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-15nilfs2: deleted inconsistent comment in nilfs_load_inode_block()Jiro SEKIBA
The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-29nilfs2: fix missing initialization of i_dir_start_lookup memberRyusuke Konishi
The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-22const: mark remaining address_space_operations constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-14nilfs2: fix ignored error code in __nilfs_read_inode()Ryusuke Konishi
The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-24switch nilfs2 to inode->i_aclAl Viro
Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-10nilfs2: support contiguous lookup of blocksRyusuke Konishi
Although get_block() callback function can return extent of contiguous blocks with bh->b_size, nilfs_get_block() function did not support this feature. This adds contiguous lookup feature to the block mapping codes of nilfs, and allows the nilfs_get_blocks() function to return the extent information by applying the feature. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10nilfs2: enable sync_page methodRyusuke Konishi
This adds a missing sync_page method which unplugs bio requests when waiting for page locks. This will improve read performance of nilfs. Here is a measurement result using dd command. Without this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s With this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10NILFS2: Pagecache usage optimization on NILFS2Hisashi Hifumi
Hi, I introduced "is_partially_uptodate" aops for NILFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. 1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil e-rw-ratio=1 run -2.6.30-rc5 Test execution summary: total time: 151.2907s total number of events: 200000 total time taken by event execution: 2409.8387 per-request statistics: min: 0.0000s avg: 0.0120s max: 0.9306s approx. 95 percentile: 0.0439s Threads fairness: events (avg/stddev): 12500.0000/238.52 execution time (avg/stddev): 150.6149/0.01 -2.6.30-rc5-patched Test execution summary: total time: 140.8828s total number of events: 200000 total time taken by event execution: 2240.8577 per-request statistics: min: 0.0000s avg: 0.0112s max: 0.8750s approx. 95 percentile: 0.0418s Threads fairness: events (avg/stddev): 12500.0000/218.43 execution time (avg/stddev): 140.0536/0.01 arch: ia64 pagesize: 16k Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-04-07nilfs2: support nanosecond timestampRyusuke Konishi
After a review of user's feedback for finding out other compatibility issues, I found nilfs improperly initializes timestamps in inode; CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs didn't have nanosecond timestamps on disk. A few users gave us the report that the tar program sometimes failed to expand symbolic links on nilfs, and it turned out to be the cause. Instead of applying the above displacement, I've decided to support nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit field was in the nilfs_inode struct, and I found it's available for this purpose without impact for the users. So, this will do the enhancement and resolve the tar problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: clean up sketch fileRyusuke Konishi
The sketch file is a file to mark checkpoints with user data. It was experimentally introduced in the original implementation, and now obsolete. The file was handled differently with regular files; the file size got truncated when a checkpoint was created. This stops the special treatment and will treat it as a regular file. Most users are not affected because mkfs.nilfs2 no longer makes this file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: replace BUG_ON and BUG calls triggerable from ioctlRyusuke Konishi
Pekka Enberg advised me: > It would be nice if BUG(), BUG_ON(), and panic() calls would be > converted to proper error handling using WARN_ON() calls. The BUG() > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be > triggerable from user-space via the ioctl() system call. This will follow the comment and keep them to a minimum. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: avoid double error caused by nilfs_transaction_endRyusuke Konishi
Pekka Enberg pointed out that double error handlings found after nilfs_transaction_end() can be avoided by separating abort operation: OK, I don't understand this. The only way nilfs_transaction_end() can fail is if we have NILFS_TI_SYNC set and we fail to construct the segment. But why do we want to construct a segment if we don't commit? I guess what I'm asking is why don't we have a separate nilfs_transaction_abort() function that can't fail for the erroneous case to avoid this double error value tracking thing? This does the separation and renames nilfs_transaction_end() to nilfs_transaction_commit() for clarification. Since, some calls of these functions were used just for exclusion control against the segment constructor, they are replaced with semaphore operations. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: fix missed-sync issue for do_sync_mapping_range()Ryusuke Konishi
Chris Mason pointed out that there is a missed sync issue in nilfs_writepages(): On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > do_sync_mapping_range(). where WB_SYNC_NONE in do_sync_mapping_range() was replaced with WB_SYNC_ALL by Nick's patch (commit: ee53a891f47444c53318b98dac947ede963db400). This fixes the problem by letting nilfs_writepages() write out the log of file data within the range if sync_mode is WB_SYNC_ALL. This involves removal of nilfs_file_aio_write() which was previously needed to ensure O_SYNC sync writes. Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: inode operationsRyusuke Konishi
This adds inode level operations of the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>