summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-04-25kernfs: add back missing error check in kernfs_fop_mmap()Tejun Heo
While updating how mmap enabled kernfs files are handled by lockdep, 9b2db6e18945 ("sysfs: bail early from kernfs_file_mmap() to avoid spurious lockdep warning") inadvertently dropped error return check from kernfs_file_mmap(). The intention was just dropping "if (ops->mmap)" check as the control won't reach the point if the mmap callback isn't implemented, but I mistakenly removed the error return check together with it. This led to Xorg crash on i810 which was reported and bisected to the commit and then to the specific change by Tobias. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-bisected-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com> References: http://lkml.kernel.org/g/533D01BD.1010200@googlemail.com Cc: stable <stable@vger.kernel.org> # 3.14 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-25kernfs: fix a subdir count leakJianyu Zhan
Currently kernfs_link_sibling() increates parent->dir.subdirs before adding the node into parent's chidren rb tree. Because it is possible that kernfs_link_sibling() couldn't find a suitable slot and bail out, this leads to a mismatch between elevated subdir count with actual children node numbers. This patches fix this problem, by moving the subdir accouting after the actual addtion happening. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-24Btrfs: correctly set profile flags on seqlock retryFilipe Manana
If we had to retry on the profiles seqlock (due to a concurrent write), we would set bits on the input flags that corresponded both to the current profile and to previous values of the profile. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24Btrfs: use correct key when repeating search for extent itemFilipe Manana
If skinny metadata is enabled and our first tree search fails to find a skinny extent item, we may repeat a tree search for a "fat" extent item (if the previous item in the leaf is not the "fat" extent we're looking for). However we were not setting the new key's objectid to the right value, as we previously used the same key variable to peek at the previous item in the leaf, which has a different objectid. So just set the right objectid to avoid modifying/deleting a wrong item if we repeat the tree search. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24Btrfs: fix inode caching vs tree logMiao Xie
Currently, with inode cache enabled, we will reuse its inode id immediately after unlinking file, we may hit something like following: |->iput inode |->return inode id into inode cache |->create dir,fsync |->power off An easy way to reproduce this problem is: mkfs.btrfs -f /dev/sdb mount /dev/sdb /mnt -o inode_cache,commit=100 dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync inode_id=`ls -i /mnt/data | awk '{print $1}'` rm -f /mnt/data i=1 while [ 1 ] do mkdir /mnt/dir_$i test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'` if [ $test1 -eq $inode_id ] then dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync echo b > /proc/sysrq-trigger fi sleep 1 i=$(($i+1)) done mount /dev/sdb /mnt umount /dev/sdb btrfs check /dev/sdb We fix this problem by adding unlinked inode's id into pinned tree, and we can not reuse them until committing transaction. Cc: stable@vger.kernel.org Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24Btrfs: fix possible memory leaks in open_ctree()Wang Shilong
Fix possible memory leaks in the following error handling paths: read_tree_block() btrfs_recover_log_trees btrfs_commit_super() btrfs_find_orphan_roots() btrfs_cleanup_fs_roots() Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24Btrfs: avoid triggering bug_on() when we fail to start inode caching taskWang Shilong
When running stress test(including snapshots,balance,fstress), we trigger the following BUG_ON() which is because we fail to start inode caching task. [ 181.131945] kernel BUG at fs/btrfs/inode-map.c:179! [ 181.137963] invalid opcode: 0000 [#1] SMP [ 181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1 [ 181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000 [ 181.367506] Call Trace: [ 181.371107] [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs] [ 181.379191] [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs] [ 181.387464] [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40 [ 181.395642] [<ffffffff811dc5fe>] evict+0x9e/0x190 [ 181.401882] [<ffffffff811dcde3>] iput+0xf3/0x180 [ 181.408025] [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs] [ 181.416614] [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs] [ 181.425399] [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs] [ 181.435059] [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs] [ 181.444148] [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs] [ 181.451971] [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80 [ 181.459509] [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520 [ 181.467046] [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0 [ 181.474393] [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0 [ 181.481450] [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0 [ 181.488021] [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b We should avoid triggering BUG_ON() here, instead, we output warning messages and clear inode_cache option. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24Btrfs: move btrfs_{set,clear}_and_info() to ctree.hWang Shilong
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24btrfs: replace error code from btrfs_drop_extentsDavid Sterba
There's a case which clone does not handle and used to BUG_ON instead, (testcase xfstests/btrfs/035), now returns EINVAL. This error code is confusing to the ioctl caller, as it normally signifies errorneous arguments. Change it to ENOPNOTSUPP which allows a fall back to copy instead of clone. This does not affect the common reflink operation. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-24btrfs: Change the hole range to a more accurate value.Qu Wenruo
Commit 3ac0d7b96a268a98bd474cab8bce3a9f125aaccf fixed the btrfs expanding write problem but the hole punched is sometimes too large for some iovec, which has unmapped data ranges. This patch will change to hole range to a more accurate value using the counts checked by the write check routines. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-23locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" insteadJeff Layton
File-private locks have been re-christened as "open file description" locks. Finish the symbol name cleanup in the internal implementation. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-22locks: rename file-private locks to "open file description locks"Jeff Layton
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as "open file description locks". The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type "OFDLCK" Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Stefan Metzmacher <metze@samba.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frank Filz <ffilzlnx@mindspring.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-20Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "These are regression and bug fixes for ext4. We had a number of new features in ext4 during this merge window (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so there were many more regression and bug fixes this time around. It didn't help that xfstests hadn't been fully updated to fully stress test COLLAPSE_RANGE until after -rc1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: disable COLLAPSE_RANGE for bigalloc ext4: fix COLLAPSE_RANGE failure with 1KB block size ext4: use EINVAL if not a regular file in ext4_collapse_range() ext4: enforce we are operating on a regular file in ext4_zero_range() ext4: fix extent merging in ext4_ext_shift_path_extents() ext4: discard preallocations after removing space ext4: no need to truncate pagecache twice in collapse range ext4: fix removing status extents in ext4_collapse_range() ext4: use filemap_write_and_wait_range() correctly in collapse range ext4: use truncate_pagecache() in collapse range ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled ext4: always check ext4_ext_find_extent result ext4: fix error handling in ext4_ext_shift_extents ext4: silence sparse check warning for function ext4_trim_extent ext4: COLLAPSE_RANGE only works on extent-based files ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches ext4: use i_size_read in ext4_unaligned_aio() fs: disallow all fallocate operation on active swapfile fs: move falloc collapse range check into the filesystem methods ...
2014-04-19ext4: disable COLLAPSE_RANGE for bigallocNamjae Jeon
Once COLLAPSE RANGE is be disable for ext4 with bigalloc feature till finding root-cause of problem. It will be enable with fixing that regression of xfstest(generic 075 and 091) again. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19ext4: fix COLLAPSE_RANGE failure with 1KB block sizeNamjae Jeon
When formatting with 1KB or 2KB(not aligned with PAGE SIZE) block size, xfstests generic/075 and 091 are failing. The offset supplied to function truncate_pagecache_range is block size aligned. In this function start offset is re-aligned to PAGE_SIZE by rounding_up to the next page boundary. Due to this rounding up, old data remains in the page cache when blocksize is less than page size and start offset is not aligned with page size. In case of collapse range, we need to align start offset to page size boundary by doing a round down operation instead of round up. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19coredump: fix va_list corruptionEric Dumazet
A va_list needs to be copied in case it needs to be used twice. Thanks to Hugh for debugging this issue, leading to various panics. Tested: lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern 'produce_core' is simply : main() { *(int *)0 = 1;} lpq84:~# ./produce_core Segmentation fault (core dumped) lpq84:~# dmesg | tail -1 [ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed Notice the last argument was replaced by a NULL (we were lucky enough to not crash, but do not try this on your production machine !) After fix : lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern lpq83:~# ./produce_core Segmentation fault lpq83:~# dmesg | tail -1 [ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Hugh Dickins <hughd@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-19fix races between __d_instantiate() and checks of dentry flagsAl Viro
in non-lazy walk we need to be careful about dentry switching from negative to positive - both ->d_flags and ->d_inode are updated, and in some places we might see only one store. The cases where dentry has been obtained by dcache lookup with ->i_mutex held on parent are safe - ->d_lock and ->i_mutex provide all the barriers we need. However, there are several places where we run into trouble: * do_last() fetches ->d_inode, then checks ->d_flags and assumes that inode won't be NULL unless d_is_negative() is true. Race with e.g. creat() - we might have fetched the old value of ->d_inode (still NULL) and new value of ->d_flags (already not DCACHE_MISS_TYPE). Lin Ming has observed and reported the resulting oops. * a bunch of places checks ->d_inode for being non-NULL, then checks ->d_flags for "is it a symlink". Race with symlink(2) in case if our CPU sees ->d_inode update first - we see non-NULL there, but ->d_flags still contains DCACHE_MISS_TYPE instead of DCACHE_SYMLINK_TYPE. Result: false negative on "should we follow link here?", with subsequent unpleasantness. Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one Reported-and-tested-by: Lin Ming <minggr@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-18Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A set of 5 small cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cif: fix dead code cifs: fix error handling cifs_user_readv fs: cifs: remove unused variable. Return correct error on query of xattr on file with empty xattrs cifs: Wait for writebacks to complete before attempting write.
2014-04-18Merge tag 'driver-core-3.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes for 3.15-rc2. Also in here are some documentation updates, as well as an API removal that had to wait for after -rc1 due to the cleanups coming into you from multiple developer trees (this one and the PPC tree.) All have been in linux next successfully" * tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers/base/dd.c incorrect pr_debug() parameters Documentation: Update stable address in Chinese and Japanese translations topology: Fix compilation warning when not in SMP Chinese: add translation of io_ordering.txt stable_kernel_rules: spelling/word usage sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() kernfs: protect lazy kernfs_iattrs allocation with mutex fs: Don't return 0 from get_anon_bdev
2014-04-18ext4: use EINVAL if not a regular file in ext4_collapse_range()Theodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: enforce we are operating on a regular file in ext4_zero_range()jon ernst
Signed-off-by: Jon Ernst <jonernst07@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: fix extent merging in ext4_ext_shift_path_extents()Lukas Czerner
There is a bug in ext4_ext_shift_path_extents() where if we actually manage to merge a extent we would skip shifting the next extent. This will result in in one extent in the extent tree not being properly shifted. This is causing failure in various xfstests tests using fsx or fsstress with collapse range support. It will also cause file system corruption which looks something like: e2fsck 1.42.9 (4-Feb-2014) Pass 1: Checking inodes, blocks, and sizes Inode 20 has out of order extents (invalid logical block 3, physical block 492938, len 2) Clear? yes ... when running e2fsck. It's also very easily reproducible just by running fsx without any parameters. I can usually hit the problem within a minute. Fix it by increasing ex_start only if we're not merging the extent. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18ext4: discard preallocations after removing spaceLukas Czerner
Currently in ext4_collapse_range() and ext4_punch_hole() we're discarding preallocation twice. Once before we attempt to do any changes and second time after we're done with the changes. While the second call to ext4_discard_preallocations() in ext4_punch_hole() case is not needed, we need to discard preallocation right after ext4_ext_remove_space() in collapse range case because in the case we had to restart a transaction in the middle of removing space we might have new preallocations created. Remove unneeded ext4_discard_preallocations() ext4_punch_hole() and move it to the better place in ext4_collapse_range() Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: no need to truncate pagecache twice in collapse rangeLukas Czerner
We're already calling truncate_pagecache() before we attempt to do any actual job so there is not need to truncate pagecache once more using truncate_setsize() after we're finished. Remove truncate_setsize() and replace it just with i_size_write() note that we're holding appropriate locks. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: fix removing status extents in ext4_collapse_range()Lukas Czerner
Currently in ext4_collapse_range() when calling ext4_es_remove_extent() to remove status extents we're passing (EXT_MAX_BLOCKS - punch_start - 1) in order to remove all extents from start of the collapse range to the end of the file. However this is wrong because we might miss the possible extent covering the last block of the file. Fix it by removing the -1. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18ext4: use filemap_write_and_wait_range() correctly in collapse rangeLukas Czerner
Currently we're passing -1 as lend argumnet for filemap_write_and_wait_range() which is wrong since lend is signed type so it would cause some confusion and we might not write_and_wait for the entire range we're expecting to write. Fix it by using LLONG_MAX instead. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: use truncate_pagecache() in collapse rangeLukas Czerner
We should be using truncate_pagecache() instead of truncate_pagecache_range() in the collapse range because we're truncating page cache from offset to the end of file. truncate_pagecache() also get rid of the private COWed pages from the range because we're going to shift the end of the file. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18Revert "nfsd4: fix nfs4err_resource in 4.1 case"J. Bruce Fields
Since we're still limiting attributes to a page, the result here is that a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE instead of NFS4ERR_RESOURCE. Both error returns are wrong, and the real bug here is the arbitrary limit on getattr results, fixed by as-yet out-of-tree patches. But at a minimum we can make life easier for clients by sticking to one broken behavior in released kernels instead of two.... Trond says: one immediate consequence of this patch will be that NFSv4.1 clients will now report EIO instead of EREMOTEIO if they hit the problem. That may make debugging a little less obvious. Another consequence will be that if we ever do try to add client side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal with the “handle existing buggy server” syndrome. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18nfsd: set timeparms.to_maxval in setup_callback_clientJeff Layton
...otherwise the logic in the timeout handling doesn't work correctly. Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18locks: allow __break_lease to sleep even when break_time is 0Jeff Layton
A fl->fl_break_time of 0 has a special meaning to the lease break code that basically means "never break the lease". knfsd uses this to ensure that leases don't disappear out from under it. Unfortunately, the code in __break_lease can end up passing this value to wait_event_interruptible as a timeout, which prevents it from going to sleep at all. This causes __break_lease to spin in a tight loop and causes soft lockups. Fix this by ensuring that we pass a minimum value of 1 as a timeout instead. Cc: <stable@vger.kernel.org> Cc: J. Bruce Fields <bfields@fieldses.org> Reported-by: Terry Barnaby <terry1@beam.ltd.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-16cif: fix dead codeMichael Opdenacker
This issue was found by Coverity (CID 1202536) This proposes a fix for a statement that creates dead code. The "rc < 0" statement is within code that is run with "rc > 0". It seems like "err < 0" was meant to be used here. This way, the error code is returned by the function. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steve French <smfrench@gmail.com>
2014-04-16cifs: fix error handling cifs_user_readvJeff Layton
Coverity says: *** CID 1202537: Dereference after null check (FORWARD_NULL) /fs/cifs/file.c: 2873 in cifs_user_readv() 2867 cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize); 2868 npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); 2869 2870 /* allocate a readdata struct */ 2871 rdata = cifs_readdata_alloc(npages, 2872 cifs_uncached_readv_complete); >>> CID 1202537: Dereference after null check (FORWARD_NULL) >>> Comparing "rdata" to null implies that "rdata" might be null. 2873 if (!rdata) { 2874 rc = -ENOMEM; 2875 goto error; 2876 } 2877 2878 rc = cifs_read_allocate_pages(rdata, npages); ...when we "goto error", rc will be non-zero, and then we end up trying to do a kref_put on the rdata (which is NULL). Fix this by replacing the "goto error" with a "break". Reported-by: <scan-admin@coverity.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-04-17xfs: fix tmpfile/selinux deadlock and initialize securityBrian Foster
xfstests generic/004 reproduces an ilock deadlock using the tmpfile interface when selinux is enabled. This occurs because xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The latter eventually calls into xfs_xattr_get() which attempts to get the lock again. E.g.: xfs_io D ffffffff81c134c0 4096 3561 3560 0x00000080 ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8 00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540 ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480 Call Trace: [<ffffffff8177f969>] schedule+0x29/0x70 [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120 [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0 [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs] [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs] [<ffffffff8124842f>] generic_getxattr+0x4f/0x70 [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650 [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20 [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30 [<ffffffff81237db0>] d_instantiate+0x50/0x70 [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0 [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs] [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs] [<ffffffff81230388>] path_openat+0x228/0x6a0 [<ffffffff810230f9>] ? sched_clock+0x9/0x10 [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8123101a>] do_filp_open+0x3a/0x90 [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210 [<ffffffff8121e4ce>] SyS_open+0x1e/0x20 [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b xfs_vn_tmpfile() also fails to initialize security on the newly created inode. Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction has been committed and the inode unlocked. Also, initialize security on the inode based on the parent directory provided via the tmpfile call. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: fix buffer use after free on IO errorEric Sandeen
When testing exhaustion of dm snapshots, the following appeared with CONFIG_DEBUG_OBJECTS_FREE enabled: ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs] indicating that we'd freed a buffer which still had a pending reference, down this path: [ 190.867975] [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270 [ 190.880820] [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370 [ 190.892615] [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs] [ 190.905629] [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs] [ 190.911770] [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs] At issue is the fact that if IO fails in xfs_buf_iorequest, we'll queue completion unconditionally, and then call xfs_buf_rele; but if IO failed, there are no IOs remaining, and xfs_buf_rele will free the bp while work is still queued. Fix this by not scheduling completion if the buffer has an error on it; run it immediately. The rest is only comment changes. Thanks to dchinner for spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: wrong error sign conversion during failed DIO writesDave Chinner
We negate the error value being returned from a generic function incorrectly. The code path that it is running in returned negative errors, so there is no need to negate it to get the correct error signs here. This was uncovered by generic/019. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: unmount does not wait for shutdown during unmountDave Chinner
And interesting situation can occur if a log IO error occurs during the unmount of a filesystem. The cases reported have the same signature - the update of the superblock counters fails due to a log write IO error: XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1 XFS (dm-16): Log I/O Error Detected. Shutting down filesystem XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount. XFS (dm-16): xfs_log_force: error 5 returned. XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s) It can be seen that the last line of output contains a corrupt device name - this is because the log and xfs_mount structures have already been freed by the time this message is printed. A kernel oops closely follows. The issue is that the shutdown is occurring in a separate IO completion thread to the unmount. Once the shutdown processing has started and all the iclogs are marked with XLOG_STATE_IOERROR, the log shutdown code wakes anyone waiting on a log force so they can process the shutdown error. This wakes up the unmount code that is doing a synchronous transaction to update the superblock counters. The unmount path now sees all the iclogs are marked with XLOG_STATE_IOERROR and so never waits on them again, knowing that if it does, there will not be a wakeup trigger for it and we will hang the unmount if we do. Hence the unmount runs through all the remaining code and frees all the filesystem structures while the xlog_iodone() is still processing the shutdown. When the log shutdown processing completes, xfs_do_force_shutdown() emits the "Please umount the filesystem and rectify the problem(s)" message, and xlog_iodone() then aborts all the objects attached to the iclog. An iclog that has already been freed.... The real issue here is that there is no serialisation point between the log IO and the unmount. We have serialisations points for log writes, log forces, reservations, etc, but we don't actually have any code that wakes for log IO to fully complete. We do that for all other types of object, so why not iclogbufs? Well, it turns out that we can easily do this. We've got xfs_buf handles, and that's what everyone else uses for IO serialisation. i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only release the lock in xlog_iodone() when we are finished with the buffer. That way before we tear down the iclog, we can lock and unlock the buffer to ensure IO completion has finished completely before we tear it down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bob Mastors <bob.mastors@solidfire.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: collapse range is delalloc challengedDave Chinner
FSX has been detecting data corruption after to collapse range calls. The key observation is that the offset of the last extent in the file was not being shifted, and hence when the file size was adjusted it was truncating away data because the extents handled been correctly shifted. Tracing indicated that before the collapse, the extent list looked like: .... ino 0x5788 state idx 6 offset 26 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 39 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 and after the shift of 2 blocks: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 Note that the last extent did not change offset. After the changing of the file size: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 30 flag 0 You can see that the last extent had it's length truncated, indicating that we've lost data. The reason for this is that the xfs_bmap_shift_extents() loop uses XFS_IFORK_NEXTENTS() to determine how many extents are in the inode. This, unfortunately, doesn't take into account delayed allocation extents - it's a count of physically allocated extents - and hence when the file being collapsed has a delalloc extent like this one does prior to the range being collapsed: .... ino 0x5788 state idx 4 offset 11 block 4503599627239429 count 1 flag 0 .... it gets the count wrong and terminates the shift loop early. Fix it by using the in-memory extent array size that includes delayed allocation extents to determine the number of extents on the inode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: don't map ranges that span EOF for direct IODave Chinner
Al Viro tracked down the problem that has caused generic/263 to fail on XFS since the test was introduced. If is caused by xfs_get_blocks() mapping a single extent that spans EOF without marking it as buffer-new() so that the direct IO code does not zero the tail of the block at the new EOF. This is a long standing bug that has been around for many, many years. Because xfs_get_blocks() starts the map before EOF, it can't set buffer_new(), because that causes he direct IO code to also zero unaligned sectors at the head of the IO. This would overwrite valid data with zeros, and hence we cannot validly return a single extent that spans EOF to direct IO. Fix this by detecting a mapping that spans EOF and truncate it down to EOF. This results in the the direct IO code doing the right thing for unaligned data blocks before EOF, and then returning to get another mapping for the region beyond EOF which XFS treats correctly by setting buffer_new() on it. This makes direct Io behave correctly w.r.t. tail block zeroing beyond EOF, and fsx is happy about that. Again, thanks to Al Viro for finding what I couldn't. [ dchinner: Fix for __divdi3 build error: Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-16sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()Tejun Heo
All device_schedule_callback_owner() users are converted to use device_remove_file_self(). Remove now unused {sysfs|device}_schedule_callback_owner(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-16kernfs: protect lazy kernfs_iattrs allocation with mutexTejun Heo
kernfs_iattrs is allocated lazily when operations which require it take place; unfortunately, the lazy allocation and returning weren't properly synchronized and when there are multiple concurrent operations, it might end up returning kernfs_iattrs which hasn't finished initialization yet or different copies to different callers. Fix it by synchronizing with a mutex. This can be smarter with memory barriers but let's go there if it actually turns out to be necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/g/533ABA32.9080602@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 3.14 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-16fs: Don't return 0 from get_anon_bdevThomas Bächler
Commit 9e30cc9595303b27b48 removed an internal mount. This has the side-effect that rootfs now has FSID 0. Many userspace utilities assume that st_dev in struct stat is never 0, so this change breaks a number of tools in early userspace. Since we don't know how many userspace programs are affected, make sure that FSID is at least 1. References: http://article.gmane.org/gmane.linux.kernel/1666905 References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557 Cc: 3.14 <stable@vger.kernel.org> Signed-off-by: Thomas Bächler <thomas@archlinux.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-16fs: cifs: remove unused variable.Cyril Roelandt
In SMB2_set_compression(), the "res_key" variable is only initialized to NULL and later kfreed. It is therefore useless and should be removed. Found with the following semantic patch: <smpl> @@ identifier foo; identifier f; type T; @@ * f(...) { ... * T *foo = NULL; ... when forall when != foo * kfree(foo); ... } </smpl> Signed-off-by: Cyril Roelandt <tipecaml@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2014-04-16Return correct error on query of xattr on file with empty xattrsSteve French
xfstest 020 detected a problem with cifs xattr handling. When a file had an empty xattr list, we returned success (with an empty xattr value) on query of particular xattrs rather than returning ENODATA. This patch fixes it so that query of an xattr returns ENODATA when the xattr list is empty for the file. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2014-04-16cifs: Wait for writebacks to complete before attempting write.Sachin Prabhu
Problem reported in Red Hat bz 1040329 for strict writes where we cache only when we hold oplock and write direct to the server when we don't. When we receive an oplock break, we first change the oplock value for the inode in cifsInodeInfo->oplock to indicate that we no longer hold the oplock before we enqueue a task to flush changes to the backing device. Once we have completed flushing the changes, we return the oplock to the server. There are 2 ways here where we can have data corruption 1) While we flush changes to the backing device as part of the oplock break, we can have processes write to the file. These writes check for the oplock, find none and attempt to write directly to the server. These direct writes made while we are flushing from cache could be overwritten by data being flushed from the cache causing data corruption. 2) While a thread runs in cifs_strict_writev, the machine could receive and process an oplock break after the thread has checked the oplock and found that it allows us to cache and before we have made changes to the cache. In that case, we end up with a dirty page in cache when we shouldn't have any. This will be flushed later and will overwrite all subsequent writes to the part of the file represented by this page. Before making any writes to the server, we need to confirm that we are not in the process of flushing data to the server and if we are, we should wait until the process is complete before we attempt the write. We should also wait for existing writes to complete before we process an oplock break request which changes oplock values. We add a version specific downgrade_oplock() operation to allow for differences in the oplock values set for the different smb versions. Cc: stable@vger.kernel.org Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
2014-04-16aio: block io_destroy() until all context requests are completedAnatol Pomozov
deletes aio context and all resources related to. It makes sense that no IO operations connected to the context should be running after the context is destroyed. As we removed io_context we have no chance to get requests status or call io_getevents(). man page for io_destroy says that this function may block until all context's requests are completed. Before kernel 3.11 io_destroy() blocked indeed, but since aio refactoring in 3.11 it is not true anymore. Here is a pseudo-code that shows a testcase for a race condition discovered in 3.11: initialize io_context io_submit(read to buffer) io_destroy() // context is destroyed so we can free the resources free(buffers); // if the buffer is allocated by some other user he'll be surprised // to learn that the buffer still filled by an outstanding operation // from the destroyed io_context The fix is straight-forward - add a completion struct and wait on it in io_destroy, complete() should be called when number of in-fligh requests reaches zero. If two or more io_destroy() called for the same context simultaneously then only the first one waits for IO completion, other calls behaviour is undefined. Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and do not see the race condition anymore. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-04-15locks: allow __break_lease to sleep even when break_time is 0Jeff Layton
A fl->fl_break_time of 0 has a special meaning to the lease break code that basically means "never break the lease". knfsd uses this to ensure that leases don't disappear out from under it. Unfortunately, the code in __break_lease can end up passing this value to wait_event_interruptible as a timeout, which prevents it from going to sleep at all. This makes __break_lease to spin in a tight loop and causes soft lockups. Fix this by ensuring that we pass a minimum value of 1 as a timeout instead. Cc: <stable@vger.kernel.org> Cc: J. Bruce Fields <bfields@fieldses.org> Reported-by: Terry Barnaby <terry1@beam.ltd.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-14ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabledAzat Khuzhin
With bigalloc enabled we must use EXT4_CLUSTERS_PER_GROUP() instead of EXT4_BLOCKS_PER_GROUP() otherwise we will go beyond the allocated buffer. $ mount -t ext4 /dev/vde /vde [ 70.573993] EXT4-fs DEBUG (fs/ext4/mballoc.c, 2346): ext4_mb_alloc_groupinfo: [ 70.575174] allocated s_groupinfo array for 1 meta_bg's [ 70.576172] EXT4-fs DEBUG (fs/ext4/super.c, 2092): ext4_check_descriptors: [ 70.576972] Checking group descriptorsBUG: unable to handle kernel paging request at ffff88006ab56000 [ 72.463686] IP: [<ffffffff81394eb9>] __bitmap_weight+0x2a/0x7f [ 72.464168] PGD 295e067 PUD 2961067 PMD 7fa8e067 PTE 800000006ab56060 [ 72.464738] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 72.465139] Modules linked in: [ 72.465402] CPU: 1 PID: 3560 Comm: mount Tainted: G W 3.14.0-rc2-00069-ge57bce1 #60 [ 72.466079] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 72.466505] task: ffff88007ce6c8a0 ti: ffff88006b7f0000 task.ti: ffff88006b7f0000 [ 72.466505] RIP: 0010:[<ffffffff81394eb9>] [<ffffffff81394eb9>] __bitmap_weight+0x2a/0x7f [ 72.466505] RSP: 0018:ffff88006b7f1c00 EFLAGS: 00010206 [ 72.466505] RAX: 0000000000000000 RBX: 000000000000050a RCX: 0000000000000040 [ 72.466505] RDX: 0000000000000000 RSI: 0000000000080000 RDI: 0000000000000000 [ 72.466505] RBP: ffff88006b7f1c28 R08: 0000000000000002 R09: 0000000000000000 [ 72.466505] R10: 000000000000babe R11: 0000000000000400 R12: 0000000000080000 [ 72.466505] R13: 0000000000000200 R14: 0000000000002000 R15: ffff88006ab55000 [ 72.466505] FS: 00007f43ba1fa840(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000 [ 72.466505] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 72.466505] CR2: ffff88006ab56000 CR3: 000000006b7e6000 CR4: 00000000000006e0 [ 72.466505] Stack: [ 72.466505] ffff88006ab65000 0000000000000000 0000000000000000 0000000000010000 [ 72.466505] ffff88006ab6f400 ffff88006b7f1c58 ffffffff81396bb8 0000000000010000 [ 72.466505] 0000000000000000 ffff88007b869a90 ffff88006a48a000 ffff88006b7f1c70 [ 72.466505] Call Trace: [ 72.466505] [<ffffffff81396bb8>] memweight+0x5f/0x8a [ 72.466505] [<ffffffff811c3b19>] ext4_count_free+0x13/0x21 [ 72.466505] [<ffffffff811c396c>] ext4_count_free_clusters+0xdb/0x171 [ 72.466505] [<ffffffff811e3bdd>] ext4_fill_super+0x117c/0x28ef [ 72.466505] [<ffffffff81391569>] ? vsnprintf+0x1c7/0x3f7 [ 72.466505] [<ffffffff8114d8dc>] mount_bdev+0x145/0x19c [ 72.466505] [<ffffffff811e2a61>] ? ext4_calculate_overhead+0x2a1/0x2a1 [ 72.466505] [<ffffffff811dab1d>] ext4_mount+0x15/0x17 [ 72.466505] [<ffffffff8114e3aa>] mount_fs+0x67/0x150 [ 72.466505] [<ffffffff811637ea>] vfs_kern_mount+0x64/0xde [ 72.466505] [<ffffffff81165d19>] do_mount+0x6fe/0x7f5 [ 72.466505] [<ffffffff81126cc8>] ? strndup_user+0x3a/0xd9 [ 72.466505] [<ffffffff8116604b>] SyS_mount+0x85/0xbe [ 72.466505] [<ffffffff81619e90>] tracesys+0xdd/0xe2 [ 72.466505] Code: c3 89 f0 b9 40 00 00 00 55 99 48 89 e5 41 57 f7 f9 41 56 49 89 ff 41 55 45 31 ed 41 54 41 89 f4 53 31 db 41 89 c6 45 39 ee 7e 10 <4b> 8b 3c ef 49 ff c5 e8 bf ff ff ff 01 c3 eb eb 31 c0 45 85 f6 [ 72.466505] RIP [<ffffffff81394eb9>] __bitmap_weight+0x2a/0x7f [ 72.466505] RSP <ffff88006b7f1c00> [ 72.466505] CR2: ffff88006ab56000 [ 72.466505] ---[ end trace 7d051a08ae138573 ]--- Killed Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-14btrfs: fix use-after-free in mount_subvol()Christoph Jaeger
Pointer 'newargs' is used after the memory that it points to has already been freed. Picked up by Coverity - CID 1201425. Fixes: 0723a0473f ("btrfs: allow mounting btrfs subvolumes with different ro/rw options") Signed-off-by: Christoph Jaeger <christophjaeger@linux.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-04-14xfs: zeroing space needs to punch delalloc blocksDave Chinner
When we are zeroing space andit is covered by a delalloc range, we need to punch the delalloc range out before we truncate the page cache. Failing to do so leaves and inconsistency between the page cache and the extent tree, which we later trip over when doing direct IO over the same range. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-14xfs: xfs_vm_write_end truncates too much on failureDave Chinner
Similar to the write_begin problem, xfs-vm_write_end will truncate back to the old EOF, potentially removing page cache from over the top of delalloc blocks with valid data in them. Fix this by truncating back to just the start of the failed write. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>