summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-01-03GFS2: Remove gfs2_quota_change_host structureSteven Whitehouse
There is only one place this is used, when reading in the quota changes at mount time. It is not really required and much simpler to just convert the fields from the on-disk structure as required. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-03GFS2: Clean up releasepageSteven Whitehouse
For historical reasons, we drop and retake the log lock in ->releasepage() however, since there is no reason why we cannot hold the log lock over the whole function, this allows some simplification. In particular, pinning a buffer is only ever done under the log lock, so it is possible here to remove the test for pinned buffers in the second loop, since it is impossible for that to happen (it is also tested in the first loop). As a result, two tests made later in the second loop become constants and can also be reduced to the only possible branch. So the net result is to remove various bits of unreachable code and make this more readable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-03GFS2: Implement a "rgrp has no extents longer than X" schemeBob Peterson
With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-03GFS2: Drop inadequate rgrps from the reservation treeBob Peterson
This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-03GFS2: If requested is too large, use the largest extent in the rgrpBob Peterson
Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-02nfsd: get rid of unused function definitionKinglong Mee
commit 557ce2646e775f6bda734dd92b10d4780874b9c7 "nfsd41: replace page based DRC with buffer based DRC" have remove unused nfsd4_set_statp, but miss the function definition. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-02nfsd: calculate the missing length of bitmap in EXCHANGE_IDKinglong Mee
commit 58cd57bfd9db3bc213bf9d6a10920f82095f0114 "nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID" miss calculating the length of bitmap for spo_must_enforce and spo_must_allow. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-02Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge patches from Andrew Morton: "Ten fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test() mm/memory-failure.c: transfer page count from head page to tail page after split thp MAINTAINERS: set up proper record for Xilinx Zynq mm: remove bogus warning in copy_huge_pmd() memcg: fix memcg_size() calculation mm: fix use-after-free in sys_remap_file_pages mm: munlock: fix deadlock in __munlock_pagevec() mm: munlock: fix a bug where THP tail page is encountered
2014-01-02epoll: do not take the nested ep->mtx on EPOLL_CTL_DELJason Baron
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock. That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple topologies"). The acquistion of the ep->mtx for the destination 'ep' was added such that a concurrent EPOLL_CTL_ADD operation would see the correct state of the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links') However, by simply not acquiring the lock, we do not serialize behind the ep->mtx from the add path, and thus may perform a full path check when if we had waited a little longer it may not have been necessary. However, this is a transient state, and performing the full loop checking in this case is not harmful. The important point is that we wouldn't miss doing the full loop checking when required, since EPOLL_CTL_ADD always locks any 'ep's that its operating upon. The reason we don't need to do lock ordering in the add path, is that we are already are holding the global 'epmutex' whenever we do the double lock. Further, the original posting of this patch, which was tested for the intended performance gains, did not perform this additional locking. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-02Merge tag 'gfs2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull GFS2 fixes from Steven Whitehouse: "Here is a set of small fixes for GFS2. There is a fix to drop s_umount which is copied in from the core vfs, two patches relate to a hard to hit "use after free" and memory leak. Two patches related to using DIO and buffered I/O on the same file to ensure correct operation in relation to glock state changes. The final patch adds an RCU read lock to ensure correct locking on an error path" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Fix unsafe dereference in dump_holder() GFS2: Wait for async DIO in glock state changes GFS2: Fix incorrect invalidation for DIO/buffered I/O GFS2: Fix slab memory leak in gfs2_bufdata GFS2: Fix use-after-free race when calling gfs2_remove_from_ail GFS2: don't hold s_umount over blkdev_put
2014-01-02jfs: fix xattr value size overflow in __jfs_setxattrJie Liu
There is a potential overflow if the specified EA value size is greater than USHRT_MAX because the size of value is limited by the on-disk format (i.e, __le16), this issue could be reflected via the tests below: # touch /jfs/testfile # setfattr -n user.comment -v `perl -e 'print "A"x65536'` /jfs/testfile setfattr: /jfs/testfile: Invalid argument Syslog: ... jfs_xsetattr: xattr_size = 21, new_size = 65557 This patch add pre-checkups of EA value size against USHRT_MAX to avoid this problem, and return -E2BIG which is consistent with the VFS setxattr interface. Moreover, fix the debug code to print the correct function name. With this fix: setfattr: /jfs/testfile: Argument list too long Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2014-01-02GFS2: Fix unsafe dereference in dump_holder()Tetsuo Handa
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-12-31libceph: all features fields must be u64Ilya Dryomov
In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31ceph fscache: Uncaching no data page from fscache in readpage()Li Wang
Currently, if one new page allocated into fscache in readpage(), however, with no data read into due to error encountered during reading from OSDs, the slot in fscache is not uncached. This patch fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com>
2013-12-31ceph fscache: Introduce a routine for uncaching single no data page from fscacheLi Wang
Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com>
2013-12-31ceph: add acl for cephfsGuangliang Zhao
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Li Wang <li.wang@ubuntykylin.com> Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
2013-12-31ceph: check caps in filemap_fault and page_mkwriteYan, Zheng
Adds cap check to the page fault handler. The check prevents page fault handler from adding new page to the page cache while Fcb caps are being revoked. This solves Fc revoking hang in multiple clients mmap IO workload. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-27cifs: set FILE_CREATEDShirish Pargaonkar
Set FILE_CREATED on O_CREAT|O_EXCL. cifs code didn't change during commit 116cc0225381415b96551f725455d067f63a76a0 Kernel bugzilla 66251 Signed-off-by: Shirish Pargaonkar <spargaonkar@suse.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27cifs: We do not drop reference to tlink in CIFSCheckMFSymlink()Sachin Prabhu
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain tlink which also grabs a reference to it. We do not drop this reference to tlink once we are done with the call. The patch fixes this issue by instead passing tcon as a parameter and avoids having to obtain a reference to the tlink. A lookup for the tcon is already made in the calling functions and this way we avoid having to re-run the lookup. This is also consistent with the argument list for other similar calls for M-F symlinks. We should also return an ENOSYS when we do not find a protocol specific function to lookup the MF Symlink data. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27Add missing end of line termination to some cifs messagesSteve French
Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Gregor Beck <gbeck@sernet.de> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2013-12-27f2fs: don't need to get f2fs_lock_op for the inline_data testJaegeuk Kim
This patch locates checking the inline_data prior to calling f2fs_lock_op() in truncate_blocks(), since getting the lock is unnecessary. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A collection of bug fixes destined for stable and some printk cleanups and a patch so that instead of BUG'ing we use the ext4_error() framework to mark the file system is corrupted" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add explicit casts when masking cluster sizes ext4: fix deadlock when writing in ENOSPC conditions jbd2: rename obsoleted msg JBD->JBD2 jbd2: revise KERN_EMERG error messages jbd2: don't BUG but return ENOSPC if a handle runs out of space ext4: Do not reserve clusters when fs doesn't support extents ext4: fix del_timer() misuse for ->s_err_report ext4: check for overlapping extents in ext4_valid_extent_entries() ext4: fix use-after-free in ext4_mb_new_blocks ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
2013-12-26f2fs: handle inline data operationsHuajun Li
Hook inline data read/write, truncate, fallocate, setattr, etc. Files need meet following 2 requirement to inline: 1) file size is not greater than MAX_INLINE_DATA; 2) file doesn't pre-allocate data blocks by fallocate(). FI_INLINE_DATA will not be set while creating a new regular inode because most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when data is submitted to block layer, ranther than set it while creating a new inode, this also avoids converting data from inline to normal data block and vice versa. While writting inline data to inode block, the first data block should be released if the file has a block indexed by i_addr[0]. On the other hand, when a file operation is appied to a file with inline data, we need to test if this file can remain inline by doing this operation, otherwise it should be convert into normal file by reserving a new data block, copying inline data to this new block and clear FI_INLINE_DATA flag. Because reserve a new data block here will make use of i_addr[0], if we save inline data in i_addr[0..872], then the first 4 bytes would be overwriten. This problem can be avoided simply by not using i_addr[0] for inline data. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: key functions to handle inline dataHuajun Li
Functions to implement inline data read/write, and move inline data to normal data block when file size exceeds inline data limitation. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: convert max_orphans to a field of f2fs_sb_infoGu Zheng
Previously, we need to calculate the max orphan num when we try to acquire an orphan inode, but it's a stable value since the super block was inited. So converting it to a field of f2fs_sb_info and use it directly when needed seems a better choose. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: check the blocksize before calling generic_direct_IO pathJaegeuk Kim
The f2fs supports 4KB block size. If user requests dwrite with under 4KB data, it allocates a new 4KB data block. However, f2fs doesn't add zero data into the untouched data area inside the newly allocated data block. This incurs an error during the xfstest #263 test as follow. 263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad) --- 263.out 2013-03-09 03:37:15.043967603 +0900 +++ 263.out.bad 2013-12-27 04:20:39.230203114 +0900 @@ -1,3 +1,976 @@ QA output created by 263 fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +truncating to largest ever: 0x12a00 +truncating to largest ever: 0x75400 +fallocating to largest ever: 0x79cbf ... (Run 'diff -u 263.out 263.out.bad' to see the entire diff) Ran: 263 Failures: 263 Failed 1 of 1 tests It turns out that, when the test tries to write 2KB data with dio, the new dio path allocates 4KB data block without filling zero data inside the remained 2KB area. Finally, the output file contains a garbage data for that region. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: should put the dnode when NEW_ADDR is detectedJaegeuk Kim
When get_dnode_of_data() in get_data_block() returns a successful dnode, we should put the dnode. But, previously, if its data block address is equal to NEW_ADDR, we didn't do that, resulting in a deadlock condition. So, this patch splits original error conditions with this case, and then calls f2fs_put_dnode before finishing the function. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: introduce F2FS_INODE macro to get f2fs_inodeJaegeuk Kim
This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode page. By using this macro, we can remove unnecessary casting codes like below. struct f2fs_inode *ri = &F2FS_NODE(inode_page)->i; -> struct f2fs_inode *ri = F2FS_INODE(inode_page); Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26f2fs: check filename length in recover_dentryChao Yu
In current flow, we will get Null return value of f2fs_find_entry in recover_dentry when name.len is bigger than F2FS_NAME_LEN, and then we still add this inode into its dir entry. To avoid this situation, we must check filename length before we use it. Another point is that we could remove the code of checking filename length In f2fs_find_entry, because f2fs_lookup will be called previously to ensure of validity of filename length. V2: o add WARN_ON() as Jaegeuk Kim suggested. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-24Merge 3.13-rc5 into staging-nextGreg Kroah-Hartman
We want these fixes here to handle some merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-23Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 fix from Jan Kara: "One simple fix of oops in ext2 which was recently hit by Christoph" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
2013-12-23udf: Fix lockdep warning from udf_symlink()Jan Kara
Lockdep is complaining about UDF: ============================================= [ INFO: possible recursive locking detected ] 3.12.0+ #16 Not tainted --------------------------------------------- ln/7386 is trying to acquire lock: (&ei->i_data_sem){+.+...}, at: [<ffffffff8142f06d>] udf_get_block+0x8d/0x130 but task is already holding lock: (&ei->i_data_sem){+.+...}, at: [<ffffffff81431a8d>] udf_symlink+0x8d/0x690 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ei->i_data_sem); lock(&ei->i_data_sem); *** DEADLOCK *** This is because we hold i_data_sem of the symlink inode while calling udf_add_entry() for the directory. I don't think this can ever lead to deadlocks since we never hold i_data_sem for two inodes in any other place. The fix is simple - move unlock of i_data_sem for symlink inode up. We don't need it for anything when linking symlink inode to directory. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
2013-12-23f2fs: avoid to set wrong pino of inode when rename dirChao Yu
When we rename a dir to new name which is not exist previous, we will set pino of parent inode with ino of child inode in f2fs_set_link. It destroy consistency of pino, it should be fixed. Thanks for previous work of Shu Tan. Signed-off-by: Shu Tan <shu.tan@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: update several commentsChao Yu
Update several comments: 1. use f2fs_{un}lock_op install of mutex_{un}lock_op. 2. update comment of get_data_block(). 3. update description of node offset. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: remove the rw_flag domain from f2fs_io_infoGu Zheng
When using the f2fs_io_info in the low level, we still need to merge the rw and rw_flag, so use the rw to hold all the io flags directly, and remove the rw_flag field. ps.It is based on the previous patch: f2fs: move all the bio initialization into __bio_alloc Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: move all the bio initialization into __bio_allocGu Zheng
Move all the bio initialization into __bio_alloc, and some minor cleanups are also added. v3: Use 'bool' rather than 'int' as Kim suggested. v2: Use 'is_read' rather than 'rw' as Yu Chao suggested. Remove the needless initialization of bio->bi_private. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: write dirty meta pages collectivelyJaegeuk Kim
This patch enhances writing dirty meta pages collectively in background. During the file data writes, it'd better avoid to write small dirty meta pages frequently. So let's give a chance to collect a number of dirty meta pages for a while. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: introduce a new direct_IO write pathJaegeuk Kim
Previously, f2fs doesn't support direct IOs with high performance, which throws every write requests via the buffered write path, resulting in highly performance degradation due to memory opeations like copy_from_user. This patch introduces a new direct IO path in which every write requests are processed by generic blockdev_direct_IO() with enhanced get_block function. The get_data_block() in f2fs handles: 1. if original data blocks are allocates, then give them to blockdev. 2. otherwise, a. preallocate requested block addresses b. do not use extent cache for better performance c. give the block addresses to blockdev This policy induces that: - new allocated data are sequentially written to the disk - updated data are randomly written to the disk. - f2fs gives consistency on its file meta, not file data. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: introduce sysfs entry to control in-place-update policyJaegeuk Kim
This patch introduces new sysfs entries for users to control the policy of in-place-updates, namely IPU, in f2fs. Sometimes f2fs suffers from performance degradation due to its out-of-place update policy that produces many additional node block writes. If the storage performance is very dependant on the amount of data writes instead of IO patterns, we'd better drop this out-of-place update policy. This patch suggests 5 polcies and their triggering conditions as follows. [sysfs entry name = ipu_policy] 0: F2FS_IPU_FORCE all the time, 1: F2FS_IPU_SSR if SSR mode is activated, 2: F2FS_IPU_UTIL if FS utilization is over threashold, 3: F2FS_IPU_SSR_UTIL if SSR mode is activated and FS utilization is over threashold, 4: F2FS_IPU_DISABLE disable IPU. (=default option) [sysfs entry name = min_ipu_util] This parameter controls the threshold to trigger in-place-updates. The number indicates percentage of the filesystem utilization, and used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies. For more details, see need_inplace_update() in segment.h. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: missing kmem_cache_destroy for discard_entryChangman Lee
insmod f2fs.ko is failed after insmod and rmmod firstly. $ sudo insmod fs/f2fs/f2fs.ko insmod: error inserting 'fs/f2fs/f2fs.ko': -1 Cannot allocate memory -- dmesg -- kmem_cache_sanity_check (free_nid): Cache name already exists. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: fix the location of tracepointJaegeuk Kim
We need to get a trace before submit_bio, since its bi_sector is remapped during the submit_bio. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: refactor bio->rw handlingJaegeuk Kim
This patch introduces f2fs_io_info to mitigate the complex parameter list. struct f2fs_io_info { enum page_type type; /* contains DATA/NODE/META/META_FLUSH */ int rw; /* contains R/RS/W/WS */ int rw_flag; /* contains REQ_META/REQ_PRIO */ } 1. f2fs_write_data_pages - DATA - WRITE_SYNC is set when wbc->WB_SYNC_ALL. 2. sync_node_pages - NODE - WRITE_SYNC all the time 3. sync_meta_pages - META - WRITE_SYNC all the time - REQ_META | REQ_PRIO all the time ** f2fs_submit_merged_bio() handles META_FLUSH. 4. ra_nat_pages, ra_sit_pages, ra_sum_pages - META - READ_SYNC Cc: Fan Li <fanofcode.li@samsung.com> Cc: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: merge pages with the same sync_mode flagFan Li
Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages submits last write requests by sync_mode flags callers pass. This causes a performance problem since continuous pages with different sync flags can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous requests often take more time. This patch makes the following modifies to DATA writebacks: 1. every page will be written back using the sync mode caller pass. 2. only pages with the same sync mode can be merged in one bio request. These changes are restricted to DATA pages.Other types of writebacks are modified To remain synchronous. In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% , and this patch has no obvious impact on other performance tests. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: add unlikely() macro for compiler more aggressivelyJaegeuk Kim
This patch adds unlikely() macro into the most of codes. The basic rule is to add that when: - checking unusual errors, - checking page mappings, - and the other unlikely conditions. Change log from v1: - Don't add unlikely for the NULL test and error test: advised by Andi Kleen. Cc: Chao Yu <chao2.yu@samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: add unlikely() macro for compiler optimizationChao Yu
As we know, some of our branch condition will rarely be true. So we could add 'unlikely' to let compiler optimize these code, by this way we could drop unneeded 'jump' assemble code to improve performance. change log: o add *unlikely* as many as possible across the whole source files at once suggested by Jaegeuk Kim. Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: avoid unneeded page release for correct _count of pageChao Yu
In find_fsync_dnodes() and recover_data(), our flow is like this: ->f2fs_submit_page_bio() -> f2fs_put_page() -> page_cache_release() ---- page->_count declined to zero. ->__free_pages() -> put_page_testzero() ---- page->_count will be declined again. We will get a segment fault in put_page_testzero when CONFIG_DEBUG_VM is on, or return MM with a bad page with wrong _count num. So let's just release this page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: use inner macro GFP_F2FS_ZERO for simplificationChao Yu
Use inner macro GFP_F2FS_ZERO to instead of GFP_NOFS | __GFP_ZERO for simplification of code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: replace the debugfs_root with f2fs_debugfs_rootYounger Liu
This minor change for the naming conventions of debugfs_root to avoid any possible conflicts to the other filesystem. Signed-off-by: Younger Liu <younger.liucn@gmail.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> [Jaegeuk Kim: change the patch name] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23f2fs: remove debufs dir if debugfs_create_file() failedYounger Liu
When debugfs_create_file() failed in f2fs_create_root_stats(), debugfs_root should be remove. Signed-off-by: Younger Liu <liuyiyang@hisense.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>