summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-07-08NFSD: Remove iattr parameter from nfsd_symlink()Kinglong Mee
Commit db2e747b1499 (vfs: remove mode parameter from vfs_symlink()) have remove mode parameter from vfs_symlink. So that, iattr isn't needed by nfsd_symlink now, just remove it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: Protect addition to the file_hashtblTrond Myklebust
Current code depends on the client_mutex to guarantee a single struct nfs4_file per inode in the file_hashtbl and make addition atomic with respect to lookup. Rely instead on the state_Lock, to make it easier to stop taking the client_mutex here later. To prevent an i_lock/state_lock inversion, change nfsd4_init_file to use ihold instead if igrab. That's also more efficient anyway as we definitely hold a reference to the inode at that point. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: fix file access refcount leak when nfsd4_truncate failsChristoph Hellwig
nfsd4_process_open2 will currently will get access to the file, and then call nfsd4_truncate to (possibly) truncate it. If that operation fails though, then the access references will never be released as the nfs4_ol_stateid is never initialized. Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring that the refcounts are properly put if the truncate fails. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08NFSD: Avoid warning message when compile at i686 archKinglong Mee
fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv': >> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default] thislen = min(len, ((void *)xdr->end - (void *)xdr->p)); Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd4: replace defer_free by svcxdr_tmpallocJ. Bruce Fields
Avoid an extra allocation for the tmpbuf struct itself, and stop ignoring some allocation failures. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd4: remove nfs4_acl_newJ. Bruce Fields
This is a not-that-useful kmalloc wrapper. And I'd like one of the callers to actually use something other than kmalloc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd4: define svcxdr_dupstr to share some common codeJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd4: remove unused defer_free argumentJ. Bruce Fields
28e05dd8457c "knfsd: nfsd4: represent nfsv4 acl with array instead of linked list" removed the last user that wanted a custom free function. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd4: rename cr_linkname->cr_dataJ. Bruce Fields
The name of a link is currently stored in cr_name and cr_namelen, and the content in cr_linkname and cr_linklen. That's confusing. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: let nfsd_symlink assume null-terminated dataJ. Bruce Fields
Currently nfsd_symlink has a weird hack to serve callers who don't null-terminate symlink data: it looks ahead at the next byte to see if it's zero, and copies it to a new buffer to null-terminate if not. That means callers don't have to null-terminate, but they *do* have to ensure that the byte following the end of the data is theirs to read. That's a bit subtle, and the NFSv4 code actually got this wrong. So let's just throw out that code and let callers pass null-terminated strings; we've already fixed them to do that. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: make NFSv2 null terminate symlink dataJ. Bruce Fields
It's simple enough for NFSv2 to null-terminate the symlink data. A bit weird (it depends on knowing that we've already read the following byte, which is either padding or part of the mode), but no worse than the conditional kstrdup it otherwise relies on in nfsd_symlink(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: fix rare symlink decoding bugJ. Bruce Fields
An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfs: only show Posix ACLs in listxattr if actually presentChristoph Hellwig
The big ACL switched nfs to use generic_listxattr, which calls all existing ->list handlers. Add a custom .listxattr implementation that only lists the ACLs if they actually are present on the given inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Philippe Troin <phil@fifi.org> Tested-by: Philippe Troin <phil@fifi.org> Fixes: 013cdf1088d7 (nfs: use generic posix ACL infrastructure ...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-08nfs: check hostname in nfs_get_clientPeng Tao
We reference cl_hostname in many places. Add a check to make sure it exists. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-08nfsv4: set hostname when creating nfsv4 ds connectionPeng Tao
We reference cl_hostname in many places for debugging purpose. So make it useful by setting hostname when calling nfs_get_client. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-08ceph: properly apply umask when ACL is enabledYan, Zheng
when ACL is enabled, posix_acl_create() may change inode's mode Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: pass proper page offset to copy_page_to_iter()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: include time stamp in replayed MDS requestsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: check unsupported fallocate modeYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-07nfsd: Fix bad reserving space for encoding rdattr_errorKinglong Mee
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-07fuse: avoid scheduling while atomicMiklos Szeredi
As reported by Richard Sharpe, an attempt to use fuse_notify_inval_entry() triggers complains about scheduling while atomic: BUG: scheduling while atomic: fuse.hf/13976/0x10000001 This happens because fuse_notify_inval_entry() attempts to allocate memory with GFP_KERNEL, holding "struct fuse_copy_state" mapped by kmap_atomic(). Introduced by commit 58bda1da4b3c "fuse/dev: use atomic maps" Fix by moving the map/unmap to just cover the actual memcpy operation. Original patch from Maxim Patlasov <mpatlasov@parallels.com> Reported-by: Richard Sharpe <realrichardsharpe@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+
2014-07-07fuse: handle large user and group IDMiklos Szeredi
If the number in "user_id=N" or "group_id=N" mount options was larger than INT_MAX then fuse returned EINVAL. Fix this to handle all valid uid/gid values. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
2014-07-07fuse: inode: drop castHimangi Saraogi
This patch removes the cast on data of type void * as it is not needed. The following Coccinelle semantic patch was used for making the change: @r@ expression x; void* e; type T; identifier f; @@ ( *((T *)e) | ((T *)x)[...] | ((T *)x)->f | - (T *) e ) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-07-07fuse: ignore entry-timeout on LOOKUP_REVALAnand Avati
The following test case demonstrates the bug: sh# mount -t glusterfs localhost:meta-test /mnt/one sh# mount -t glusterfs localhost:meta-test /mnt/two sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file bash: /mnt/one/file: Stale file handle sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file On the second open() on /mnt/one, FUSE would have used the old nodeid (file handle) trying to re-open it. Gluster is returning -ESTALE. The ESTALE propagates back to namei.c:filename_lookup() where lookup is re-attempted with LOOKUP_REVAL. The right behavior now, would be for FUSE to ignore the entry-timeout and and do the up-call revalidation. Instead FUSE is ignoring LOOKUP_REVAL, succeeding the revalidation (because entry-timeout has not passed), and open() is again retried on the old file handle and finally the ESTALE is going back to the application. Fix: if revalidation is happening with LOOKUP_REVAL, then ignore entry-timeout and always do the up-call. Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
2014-07-07fuse: timeout comparison fixMiklos Szeredi
As suggested by checkpatch.pl, use time_before64() instead of direct comparison of jiffies64 values. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org>
2014-07-05ext4: disable synchronous transaction batching if max_batch_time==0Eric Sandeen
The mount manpage says of the max_batch_time option, This optimization can be turned off entirely by setting max_batch_time to 0. But the code doesn't do that. So fix the code to do that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-05ext4: clarify ext4_error message in ext4_mb_generate_buddy_error()Theodore Ts'o
We are spending a lot of time explaining to users what this error means. Let's try to improve the message to avoid this problem. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-05ext4: clarify error count warning messagesTheodore Ts'o
Make it clear that values printed are times, and that it is error since last fsck. Also add note about fsck version required. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2014-07-05ext4: fix unjournalled bg descriptor while initializing inode bitmapTheodore Ts'o
The first time that we allocate from an uninitialized inode allocation bitmap, if the block allocation bitmap is also uninitalized, we need to get write access to the block group descriptor before we start modifying the block group descriptor flags and updating the free block count, etc. Otherwise, there is the potential of a bad journal checksum (if journal checksums are enabled), and of the file system becoming inconsistent if we crash at exactly the wrong time. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've queued up a few fixes in my for-linus branch" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix crash when starting transaction Btrfs: fix btrfs_print_leaf for skinny metadata Btrfs: fix race of using total_bytes_pinned btrfs: use E2BIG instead of EIO if compression does not help btrfs: remove stale comment from btrfs_flush_all_pending_stuffs Btrfs: fix use-after-free when cloning a trailing file hole btrfs: fix null pointer dereference in btrfs_show_devname when name is null btrfs: fix null pointer dereference in clone_fs_devices when name is null btrfs: fix nossd and ssd_spread mount option regression Btrfs: fix race between balance recovery and root deletion Btrfs: atomically set inode->i_flags in btrfs_update_iflags btrfs: only unlock block in verify_parent_transid if we locked it Btrfs: assert send doesn't attempt to start transactions btrfs compression: reuse recently used workspace Btrfs: fix crash when mounting raid5 btrfs with missing disks btrfs: create sprout should rename fsid on the sysfs as well btrfs: dev replace should replace the sysfs entry btrfs: dev add should add its sysfs entry btrfs: dev delete should remove sysfs entry btrfs: rename add_device_membership to btrfs_kobj_add_device
2014-07-03Merge tag 'driver-core-3.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Well, one drivercore fix for kernfs to resolve a reported issue with sysfs files being updated from atomic contexts, and another lz4 bugfix for testing potential buffer overflows" * tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: lz4: add overrun checks to lz4_uncompress_unknownoutputsize() kernfs: kernfs_notify() must be useable from non-sleepable contexts
2014-07-03Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "By coincidence, two NFSv4 symlink bugs, one introduced in the 3.16 xdr encoding rewrite, the other a decoding bug that I think we've had since the start but that just doesn't trigger very often" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: nfs: fix nfs4d readlink truncated packet nfsd: fix rare symlink decoding bug
2014-07-03fs/seq_file: fallback to vmalloc allocationHeiko Carstens
There are a couple of seq_files which use the single_open() interface. This interface requires that the whole output must fit into a single buffer. E.g. for /proc/stat allocation failures have been observed because an order-4 memory allocation failed due to memory fragmentation. In such situations reading /proc/stat is not possible anymore. Therefore change the seq_file code to fallback to vmalloc allocations which will usually result in a couple of order-0 allocations and hence also work if memory is fragmented. For reference a call trace where reading from /proc/stat failed: sadc: page allocation failure: order:4, mode:0x1040d0 CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1 [...] Call Trace: show_stack+0x6c/0xe8 warn_alloc_failed+0xd6/0x138 __alloc_pages_nodemask+0x9da/0xb68 __get_free_pages+0x2e/0x58 kmalloc_order_trace+0x44/0xc0 stat_open+0x5a/0xd8 proc_reg_open+0x8a/0x140 do_dentry_open+0x1bc/0x2c8 finish_open+0x46/0x60 do_last+0x382/0x10d0 path_openat+0xc8/0x4f8 do_filp_open+0x46/0xa8 do_sys_open+0x114/0x1f0 sysc_tracego+0x14/0x1a Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03/proc/stat: convert to single_open_size()Heiko Carstens
These two patches are supposed to "fix" failed order-4 memory allocations which have been observed when reading /proc/stat. The problem has been observed on s390 as well as on x86. To address the problem change the seq_file memory allocations to fallback to use vmalloc, so that allocations also work if memory is fragmented. This approach seems to be simpler and less intrusive than changing /proc/stat to use an interator. Also it "fixes" other users as well, which use seq_file's single_open() interface. This patch (of 2): Use seq_file's single_open_size() to preallocate a buffer that is large enough to hold the whole output, instead of open coding it. Also calculate the requested size using the number of online cpus instead of possible cpus, since the size of the output only depends on the number of online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03autofs4: fix false positive compile errorIan Kent
On strict build environments we can see: fs/autofs4/inode.c: In function 'autofs4_fill_super': fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function make[2]: *** [fs/autofs4/inode.o] Error 1 make[1]: *** [fs/autofs4] Error 2 make: *** [fs] Error 2 make: *** Waiting for unfinished jobs.... This is due to the use of pgrp_set being used to indicate pgrp has has been set rather than initializing pgrp itself. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03Btrfs: fix crash when starting transactionFilipe Manana
Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] ------------[ cut here ]------------ [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [51502.244010] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0 [51502.244010] [<ffffffff811baebc>] vfs_unlink+0xcc/0x150 [51502.244010] [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0 [51502.244010] [<ffffffff811a9ef4>] ? filp_close+0x64/0x90 [51502.244010] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40 [51502.244010] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] ------------[ cut here ]------------ [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [25405.100008] Call Trace: [25405.100008] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.100008] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.100008] [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.100008] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.100008] [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0 [25405.100008] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [25405.100008] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.100008] [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs] [25405.100008] [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.100008] [<ffffffff811bc63b>] vfs_create+0x9b/0x130 [25405.100008] [<ffffffff811bcf19>] do_last+0x849/0xe20 [25405.100008] [<ffffffff811b9409>] ? link_path_walk+0x79/0x820 [25405.100008] [<ffffffff811bd5b5>] path_openat+0xc5/0x690 [25405.100008] [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10 [25405.100008] [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0 [25405.100008] [<ffffffff811be2a3>] do_filp_open+0x43/0xa0 [25405.100008] [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0 [25405.100008] [<ffffffff811abcfc>] do_sys_open+0x13c/0x230 [25405.100008] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.100008] [<ffffffff811abe12>] SyS_open+0x22/0x30 [25405.100008] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.100008] RIP [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix btrfs_print_leaf for skinny metadataJosef Bacik
We wouldn't actuall print the extent information if we had a skinny metadata item, this fixes that. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race of using total_bytes_pinnedLiu Bo
This percpu counter @total_bytes_pinned is introduced to skip unnecessary operations of 'commit transaction', it accounts for those space we may free but are stuck in delayed refs. And we zero out @space_info->total_bytes_pinned every transaction period so we have a better idea of how much space we'll actually free up by committing this transaction. However, we do the 'zero out' part a little earlier, before we actually unpin space, so we end up returning ENOSPC when we actually have free space that's just unpinned from committing transaction. xfstests/generic/074 complained then. This fixes it by actually accounting the percpu pinned number when 'unpin', and since it's protected by space_info->lock, the race is gone now. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: use E2BIG instead of EIO if compression does not helpDavid Sterba
Return codes got updated in 60e1975acb48fc3d74a3422b21dde74c977ac3d5 (btrfs: return errno instead of -1 from compression) lzo wrapper returns E2BIG in this case, do the same for zlib. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-07-03btrfs: remove stale comment from btrfs_flush_all_pending_stuffsDavid Sterba
Commit fcebe4562dec83b3f8d3088d77584727b09130b2 (Btrfs: rework qgroup accounting) removed the qgroup accounting after delayed refs. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-07-03Btrfs: fix use-after-free when cloning a trailing file holeFilipe Manana
The transaction handle was being used after being freed. Cc: Chris Mason <clm@fb.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in btrfs_show_devname when name is nullAnand Jain
dev->name is null but missing flag is not set. Strictly speaking the missing flag should have been set, but there are more places where code just checks if name is null. For now this patch does the same. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000064 IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs] [<ffffffff81198879>] show_vfsmnt+0x39/0x130 [<ffffffff81178056>] m_show+0x16/0x20 [<ffffffff8117d706>] seq_read+0x296/0x390 [<ffffffff8115aa7d>] vfs_read+0x9d/0x160 [<ffffffff8115b549>] SyS_read+0x49/0x90 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in clone_fs_devices when name is nullAnand Jain
when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812e18c0>] strlen+0x0/0x30 [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs] [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs] [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0 [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs] [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0 [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180 [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590 [<ffffffff81193426>] ? blkdev_put+0x106/0x110 [<ffffffff81179175>] ? mntput+0x35/0x40 [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0 [<ffffffff8115c72e>] ? ____fput+0xe/0x10 [<ffffffff81068033>] ? task_work_run+0xb3/0xd0 [<ffffffff8116d547>] SyS_ioctl+0x57/0x90 [<ffffffff817a793e>] ? do_page_fault+0xe/0x10 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix nossd and ssd_spread mount option regressionEric Sandeen
The commit 0780253 btrfs: Cleanup the btrfs_parse_options for remount. broke ssd options quite badly; it stopped making ssd_spread imply ssd, and it made "nossd" unsettable. Put things back at least as well as they were before (though ssd mount option handling is still pretty odd: # mount -o "nossd,ssd_spread" works?) Reported-by: Roman Mamedov <rm@romanrm.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race between balance recovery and root deletionWang Shilong
Balance recovery is called when RW mounting or remounting from RO to RW, it is called to finish roots merging. When doing balance recovery, relocation root's corresponding fs root(whose root refs is 0) might be destroyed by cleaner thread, this will make btrfs fail to mount. Fix this problem by holding @cleaner_mutex when doing balance recovery. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: atomically set inode->i_flags in btrfs_update_iflagsFilipe Manana
This change is based on the corresponding recent change for ext4: ext4: atomically set inode->i_flags in ext4_set_inode_flags() That has the following commit message that applies to btrfs as well: "Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time." Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE and BTRFS_INODE_APPEND, respectively. Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-02fs/jffs2/xattr.c: remove null test before kfreeFabian Frederick
Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2014-07-02fs/jffs2/acl.c: remove null test before kfreeFabian Frederick
Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2014-07-02nfs: fix nfs4d readlink truncated packetAvi Kivity
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding, but truncates the packet to the padding-less size. Fix by taking the padding into consideration when truncating the packet. Symptoms: # ll /mnt/ ls: cannot read symbolic link /mnt/test: Input/output error total 4 -rw-r--r--. 1 root root 0 Jun 14 01:21 123456 lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation) Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-02kernfs: kernfs_notify() must be useable from non-sleepable contextsTejun Heo
d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too") added fsnotify triggering to kernfs_notify() which requires a sleepable context. There are already existing users of kernfs_notify() which invoke it from an atomic context and in general it's silly to require a sleepable context for triggering a notification. The following is an invalid context bug triggerd by md invoking sysfs_notify() from IO completion path. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 2 locks held by swapper/1/0: #0: (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk] #1: (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240 irq event stamp: 33518 hardirqs last enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72 softirqs last enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c 0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000 0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2 Call Trace: <IRQ> [<ffffffff81807b4c>] dump_stack+0x4d/0x66 [<ffffffff810d4f14>] __might_sleep+0x184/0x240 [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440 [<ffffffff812d76a0>] kernfs_notify+0x90/0x150 [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240 [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1] [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1] [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1] [<ffffffff813acb8b>] bio_endio+0x6b/0xa0 [<ffffffff813b46c4>] blk_update_request+0x94/0x420 [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70 [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk] [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120 [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30 [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk] [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring] [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340 [<ffffffff8111647d>] handle_irq_event+0x3d/0x60 [<ffffffff81119436>] handle_edge_irq+0x66/0x130 [<ffffffff8101c3e4>] handle_irq+0x84/0x150 [<ffffffff818146ad>] do_IRQ+0x4d/0xe0 [<ffffffff818122f2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10 [<ffffffff81025454>] default_idle+0x24/0x230 [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20 [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0 [<ffffffff8104df1b>] start_secondary+0x25b/0x300 This patch fixes it by punting the notification delivery through a work item. This ends up adding an extra pointer to kernfs_elem_attr enlarging kernfs_node by a pointer, which is not ideal but not a very big deal either. If this turns out to be an actual issue, we can move kernfs_elem_attr->size to kernfs_node->iattr later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>