summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-11-05ext4: move_extent improve bh vanishing success factorDmitry Monakhov
Xiaoguang Wang has reported sporadic EBUSY failures of ext4/302 Unfortunetly there is nothing we can do if some other task holds BH's refenrence. So we must return EBUSY in this case. But we can try kicking the journal to see if the other task releases the bh reference after the commit is complete. Also decrease false positives by properly checking for ENOSPC and retrying the allocation after kicking the journal --- which is done by ext4_should_retry_alloc(). [ Modified by tytso to properly check for ENOSPC. ] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-11-05ovl: don't poison cursorMiklos Szeredi
ovl_cache_put() can be called from ovl_dir_reset() if the cache needs to be rebuilt. We did list_del() on the cursor, which results in an Oops on the poisoned pointer in ovl_seek_cursor(). Reported-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-04Revert "NFS: nfs4_do_open should add negative results to the dcache."Trond Myklebust
This reverts commit 4fa2c54b5198d09607a534e2fd436581064587ed.
2014-11-04Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"Trond Myklebust
This reverts commit f39c01047994e66e7f3d89ddb4c6141f23349d8d.
2014-11-04NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENTTrond Myklebust
If the OPEN rpc call to the server fails with an ENOENT call, nfs_atomic_open will create a negative dentry for that file, however it currently fails to call nfs_set_verifier(), thus causing the dentry to be immediately revalidated on the next call to nfs_lookup_revalidate() instead of following the usual lookup caching rules. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-04f2fs: introduce -o fastboot for reducing booting time onlyJaegeuk Kim
If a system wants to reduce the booting time as a top priority, now we can use a mount option, -o fastboot. With this option, f2fs conducts a little bit slow write_checkpoint, but it can avoid the node page reads during the next mount time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04f2fs: avoid race condition in handling wait_ioJaegeuk Kim
__submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04f2fs: send discard commands in larger extentJaegeuk Kim
If there is a chance to make a huge sized discard command, we don't need to split it out, since each blkdev_issue_discard should wait one at a time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04f2fs: revisit inline_data to avoid data races and potential bugsJaegeuk Kim
This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04writeback: fix a subtle race condition in I_DIRTY clearingTejun Heo
After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and tests inode->i_state locklessly to see whether it already has all the necessary I_DIRTY bits set. The comment above the barrier doesn't contain any useful information - memory barriers can't ensure "changes are seen by all cpus" by itself. And it sure enough was broken. Please consider the following scenario. CPU 0 CPU 1 ------------------------------------------------------------------------------- enters __writeback_single_inode() grabs inode->i_lock tests PAGECACHE_TAG_DIRTY which is clear enters __set_page_dirty() grabs mapping->tree_lock sets PAGECACHE_TAG_DIRTY releases mapping->tree_lock leaves __set_page_dirty() enters __mark_inode_dirty() smp_mb() sees I_DIRTY_PAGES set leaves __mark_inode_dirty() clears I_DIRTY_PAGES releases inode->i_lock Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem to lead to an immediately critical problem because requeue_inode() later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when deciding whether the inode needs to be requeued for IO and there are enough unintentional memory barriers inbetween, so while the inode ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the IO list. The lack of explicit barrier may also theoretically affect the other I_DIRTY bits which deal with metadata dirtiness. There is no guarantee that a strong enough barrier exists between I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied inode. Filesystem inode writeout path likely has enough stuff which can behave as full barrier but it's theoretically possible that the writeout may not see all the updates from ->dirty_inode(). Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note that I_DIRTY_PAGES needs a special treatment as it always needs to be cleared to be interlocked with the lockless test on __mark_inode_dirty() side. It's cleared unconditionally and reinstated after smp_mb() if the mapping still has dirty pages. Also add comments explaining how and why the barriers are paired. Lightly tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanupChris Mason
If we hit any errors in btrfs_lookup_csums_range, we'll loop through all the csums we allocate and free them. But the code was using list_entry incorrectly, and ended up trying to free the on-stack list_head instead. This bug came from commit 0678b6185 btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range() Signed-off-by: Chris Mason <clm@fb.com> Reported-by: Erik Berg <btrfs@slipsprogrammoer.no> cc: stable@vger.kernel.org # 3.3 or newer
2014-11-04quota: Add log level to printkAnton Blanchard
JK: Added VFS: prefix to the message when changing it to make it more standard. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-03Merge branch 'platform/remove_owner' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into driver-core-next Remove all .owner fields from platform drivers
2014-11-03f2fs: remove pointless bit testing in f2fs_delete_entry()Jan Kara
There's no point in using test_and_clear_bit_le() when we don't use the return value of the function. Just use clear_bit_le() instead. Coverity-id: 1016434 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: do not discard data protected by the previous checkpointJaegeuk Kim
We should not discard any data protected by the previous checkpoint all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: flush_dcache_page for inline dataJaegeuk Kim
When reading inline data, we should call flush_dcache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: call write_checkpoint under disabled gcJaegeuk Kim
During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any filesystem consistency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: fix possible data corruption in f2fs_write_begin()Jan Kara
f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has inline data. However it uses its contents to decide whether it should just zero out the page or load data to it. Thus if we are unlucky we can zero out page contents instead of loading inline data into a page. CC: stable@vger.kernel.org CC: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: use current_sit_addr to replace the open codeGu Zheng
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bitGu Zheng
Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean set/clear bit and return the old value, for better readability. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: set raw_super default to NULL to avoid compile warningGu Zheng
Set raw_super default to NULL to avoid the possibly used uninitialized warning, though we may never hit it in fact. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: introduce f2fs_change_bit to simplify the change bit logicGu Zheng
Introduce f2fs_change_bit to simplify the change bit logic in function set_to_next_nat{sit}. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: remove the redundant function cond_clear_inode_flagGu Zheng
Use clear_inode_flag to replace the redundant cond_clear_inode_flag. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: remove the seems unneeded argument 'type' from __get_victimGu Zheng
Remove the unneeded argument 'type' from __get_victim, use NO_CHECK_TYPE directly when calling v_ops->get_victim(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()Jan Kara
If user specifies too low end sector for trimming, f2fs_trim_fs() will use uninitialized value as a number of trimmed blocks and returns it to userspace. Initialize number of trimmed blocks early to avoid the problem. Coverity-id: 1248809 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: declare f2fs_convert_inline_dir as a static functionJaegeuk Kim
This patch declares f2fs_convert_inline_dir as a static function, which was reported by kbuild test robot. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: use kmap_atomic instead of kmapJaegeuk Kim
For better performance, we need to use kmap_atomic instead of kmap. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: reuse make_empty_dir code for inline_dentryJaegeuk Kim
This patch introduces do_make_empty_dir to mitigate code redundancy for inline_dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: introduce f2fs_dentry_ptr structure for code clean-upJaegeuk Kim
This patch introduces f2fs_dentry_ptr structure for the use of a function parameter in inline_dentry operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: should not truncate any inline_dentryJaegeuk Kim
If the inode has inline_dentry, we should not truncate any block indices. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: reuse core function in f2fs_readdir for inline_dentryJaegeuk Kim
This patch introduces a core function, f2fs_fill_dentries, to remove redundant code in f2fs_readdir and f2fs_read_inline_dir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: fix counting inline_data inode numbersJaegeuk Kim
This patch fixes wrongly counting inline_data inode numbers. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: add stat info for inline_dentry inodesJaegeuk Kim
This patch adds status information for inline_dentry inodes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: avoid deadlock on init_inode_metadataJaegeuk Kim
Previously, init_inode_metadata does not hold any parent directory's inode page. So, f2fs_init_acl can grab its parent inode page without any problem. But, when we use inline_dentry, that page is grabbed during f2fs_add_link, so that we can fall into deadlock condition like below. INFO: task mknod:11006 blocked for more than 120 seconds. Tainted: G OE 3.17.0-rc1+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mknod D ffff88003fc94580 0 11006 11004 0x00000000 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002 Call Trace: [<ffffffff8173dc40>] ? bit_wait+0x50/0x50 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0 [<ffffffff811640a7>] __lock_page+0x67/0x70 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs] [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs] [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs] [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs] [<ffffffff8122d847>] get_acl+0x47/0x70 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs] [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs] [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs] [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs] [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs] [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs] [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200 [<ffffffff81741d7f>] tracesys+0xe1/0xe6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: fix to wait correct block typeJaegeuk Kim
The inode page needs to wait NODE block io. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: reuse find_in_block code for find_in_inline_dirJaegeuk Kim
This patch removes redundant copied code in find_in_inline_dir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: reuse room_for_filename for inline dentry operationJaegeuk Kim
This patch introduces to reuse the existing room_for_filename for inline dentry operation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: enable inline dir handlingChao Yu
Add inline dir functions into normal dir ops' function to handle inline ops. Besides, we enable inline dir mode when a new dir inode is created if inline_data option is on. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: add key function to handle inline dirChao Yu
Adds Functions to implement inline dir init/lookup/insert/delete/convert ops. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: export dir operations for inline dirChao Yu
This patch exports some dir operations for inline dir, additionally introduces f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: add a new mount option for inline dirChao Yu
Adds a new mount option 'inline_dentry' for inline dir. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: add infra struct and helper for inline dirChao Yu
This patch defines macro/inline dentry structure, and adds some helpers for inline dir infrastructure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: avoid infinite loop at cp_errorJaegeuk Kim
This patch avoids an infinite loop in sync_dirty_inode_page when -EIO was detected. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: avoid build warningJaegeuk Kim
This patch removes build warning. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: fix to call f2fs_unlock_opJaegeuk Kim
This patch fixes to call f2fs_unlock_op, which was missing before. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: avoid to allocate when inline_data was writtenJaegeuk Kim
The sceanrio is like this. inline_data i_size page write_begin/vm_page_mkwrite X 30 dirty_page X 30 write to #4096 position X 30 get_dnode_of_data wait for get_dnode_of_data O 30 write inline_data O 30 get_dnode_of_data O 30 reserve data block .. In this case, we have #0 = NEW_ADDR and inline_data as well. We should not allow this condition for further access. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: use highmem for directory pagesJaegeuk Kim
This patch fixes to use highmem for directory pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: fix race conditon on truncation with inline_dataJaegeuk Kim
Let's consider the following scenario. blkaddr[0] inline_data i_size i_blocks writepage truncate NEW X 4096 2 dirty page #0 NEW X 0 change i_size NEW X 0 2 f2fs_write_inline_data NEW X 0 2 get_dnode_of_data NEW X 0 2 truncate_data_blocks_range NULL O 0 1 memcpy(inline_data) NULL O 0 1 f2fs_put_dnode NULL O 0 1 f2fs_truncate NULL O 0 1 get_dnode_of_data NULL O 0 1 *invalid block addr* This patch adds checking inline_data flag during f2fs_truncate not to refer corrupted block indices. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: should truncate any allocated block for inline_data writeJaegeuk Kim
When trying to write inline_data, we should truncate any data block allocated and pointed by the inode block. We should consider the data index is not 0. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03f2fs: invalidate inmemory pageJaegeuk Kim
If user truncates file's data, we should truncate inmemory pages too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>