summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-08-04f2fs: use for_each_set_bit to simplify the codeChao Yu
This patch uses for_each_set_bit to simplify some codes in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04f2fs: add f2fs_balance_fs for expand_inode_dataChao Yu
This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure with segment. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04f2fs: invalidate xattr node page when evict inodeChao Yu
When inode is evicted, all the page cache belong to this inode should be released including the xattr node page. But previously we didn't do this, this patch fixed this issue. v2: o reposition invalidate_mapping_pages() to the right place suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04Merge branch 'for-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_*() to define __this_cpu_*() percpu: reorder macros in percpu header files percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...
2014-08-04proc: Point /proc/mounts at /proc/thread-self/mounts instead of ↵Eric W. Biederman
/proc/self/mounts In oddball cases where the thread has a different mount namespace than the thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/mounts continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-04proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/netEric W. Biederman
In oddball cases where the thread has a different network namespace than the primary thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/net continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-04proc: Implement /proc/thread-self to point at the directory of the current ↵Eric W. Biederman
thread /proc/thread-self is derived from /proc/self. /proc/thread-self points to the directory in proc containing information about the current thread. This funtionality has been missing for a long time, and is tricky to implement in userspace as gettid() is not exported by glibc. More importantly this allows fixing defects in /proc/mounts and /proc/net where in a threaded application today they wind up being empty files when only the initial pthread has exited, causing problems for other threads. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-04proc: Have net show up under /proc/<tgid>/task/<tid>Eric W. Biederman
Network namespaces are per task so it make sense for them to show up in the task directory. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-04Merge tag 'locks-v3.17-1' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking related changes from Jeff Layton: "Just a couple of changes from Christoph to start us down the road toward getting rid of the fl_owner_t typedef" * tag 'locks-v3.17-1' of git://git.samba.org/jlayton/linux: locks: purge fl_owner_t from fs/locks.c locks: typedef fl_owner_t to void *
2014-08-04NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumesEric W. Biederman
The usage of pid_ns->child_reaper->nsproxy->net_ns in nfs_server_list_open and nfs_client_list_open is not safe. /proc for a pid namespace can remain mounted after the all of the process in that pid namespace have exited. There are also times before the initial process in a pid namespace has started or after the initial process in a pid namespace has exited where pid_ns->child_reaper can be NULL or stale. Making the idiom pid_ns->child_reaper->nsproxy a double whammy of problems. Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and /proc/net/nfsfs/volumes and add a symlink from the original location, and to use seq_open_net as it has been designed. Cc: stable@vger.kernel.org Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-04NFS: fix two problems in lookup_revalidate in RCU-walkNeilBrown
1/ rcu_dereference isn't correct: that field isn't RCU protected. It could potentially change at any time so ACCESS_ONCE might be justified. changes to ->d_parent are protected by ->d_seq. However that isn't always checked after ->d_revalidate is called, so it is safest to keep the double-check that ->d_parent hasn't changed at the end of these functions. 2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten. So 'parent' was not the parent of 'dentry'. This fails safe is the context is that dentry->d_inode is NULL, and the result of parent->d_inode being NULL is that ECHILD is returned, which is always safe. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-04Merge branch 'xfs-misc-fixes-3.17-2' into for-nextDave Chinner
2014-08-04Merge branch 'xfs-bulkstat-refactor' into for-nextDave Chinner
2014-08-04Merge branch 'xfs-misc-fixes-3.17-1' into for-nextDave Chinner
2014-08-04Merge branch 'xfs-quota-eofblocks-scan' into for-nextDave Chinner
2014-08-04xfs: fix coccinelle warningskbuild test robot
Removes unneeded semicolon, introduced by commit a70a4fa5 ("xfs: fix a couple error sequence jumps in xfs_mountfs"): fs/xfs/xfs_mount.c:858:24-25: Unneeded semicolon Generated by: scripts/coccinelle/misc/semicolon.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: flush both inodes in xfs_swap_extentsDave Chinner
We need to treat both inodes identically from a page cache point of view when prepareing them for extent swapping. We don't do this right now - we assume that one of the inodes empty, because that's what xfs_fsr currently does. Remove this assumption from the code. While factoring out the flushing and related checks, move the transactions reservation to immeidately after the flushes so that we don't need to pick up and then drop the ilock to do the transaction reservation. There are no issues with aborting the transaction it if the checks fail before we join the inodes to the transaction and dirty them, so this is a safe change to make. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: fix swapext ilock deadlockDave Chinner
xfs_swap_extents() holds the ilock over a call to filemap_write_and_wait(), which can then try to write data and take the ilock. That causes a self-deadlock. Fix the deadlock and clean up the code by separating the locking appropriately. Add a lockflags variable to track what locks we are holding as we gain and drop them and cleanup the error handling to always use "out_unlock" with the lockflags variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: kill xfs_vnode.hDave Chinner
Move the IO flag definitions to xfs_inode.h and kill the header file as it is now empty. Removing the xfs_vnode.h file showed up an implicit header include path: xfs_linux.h -> xfs_vnode.h -> xfs_fs.h And so every xfs header file has been inplicitly been including xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h causes all sorts of build issues because BBTOB() and friends are no longer automatically included in the build. This also gets fixed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: kill VN_MAPPEDDave Chinner
Only one user, no longer needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: kill VN_CACHEDDave Chinner
Only has 2 users, has outlived it's usefulness. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: kill VN_DIRTY()Dave Chinner
Only one user of the macro and the dirty mapping check is redundant so just get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: dquot recovery needs verifiersDave Chinner
dquot recovery should add verifiers to the dquot buffers that it recovers changes into. Unfortunately, it doesn't attached the verifiers to the buffers in a consistent manner. For example, xlog_recover_dquot_pass2() reads dquot buffers without a verifier and then writes it without ever having attached a verifier to the buffer. Further, dquot buffer recovery may write a dquot buffer that has not been modified, or indeed, shoul dbe written because quotas are not enabled and hence changes to the buffer were not replayed. In this case, we again write buffers without verifiers attached because that doesn't happen until after the buffer changes have been replayed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: quotacheck leaves dquot buffers without verifiersDave Chinner
When running xfs/305, I noticed that quotacheck was flushing dquot buffers that did not have the xfs_dquot_buf_ops verifiers attached: XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8 ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00 DQ....e......... ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001 ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000 ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80 Call Trace: [<ffffffff81cf1cca>] dump_stack+0x45/0x56 [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0 [<ffffffff810db520>] ? wake_up_state+0x20/0x20 [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0 [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220 [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90 [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90 [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0 [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0 [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0 [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340 [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0 [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170 [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20 [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0 [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120 [<ffffffff811e757e>] do_mount+0x23e/0xad0 [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50 [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150 [<ffffffff811e8103>] SyS_mount+0x83/0xc0 [<ffffffff81cfd40b>] tracesys+0xdd/0xe2 This was caused by dquot buffer readahead not attaching a verifier structure to the buffer when readahead was issued, resulting in the followup read of the buffer finding a valid buffer and so not attaching new verifiers to the buffer as part of the read. Also, when a verifier failure occurs, we then read the buffer without verifiers. Attach the verifiers manually after this read so that if the buffer is then written it will be verified that the corruption has been repaired. Further, when flushing a dquot we don't ask for a verifier when reading in the dquot buffer the dquot belongs to. Most of the time this isn't an issue because the buffer is still cached, but when it is not cached it will result in writing the dquot buffer without having the verfier attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: ensure verifiers are attached to recovered buffersDave Chinner
Crash testing of CRC enabled filesystems has resulted in a number of reports of bad CRCs being detected after the filesystem was mounted. Errors such as the following were being seen: XFS (sdb3): Mounting V5 Filesystem XFS (sdb3): Starting recovery (logdev: internal) XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1 XFS (sdb3): Unmount and run xfs_repair XFS (sdb3): First 64 bytes of corrupted metadata buffer: ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40 XAGF...........@ ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01 ..mS..w......... ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00 ................ XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1 The errors were typically being seen in AGF, AGI and their related btree block buffers some time after log recovery had run. Often it wasn't until later subsequent mounts that the problem was discovered. The common symptom was a buffer with the correct contents, but a CRC and an LSN that matched an older version of the contents. Some debug added to _xfs_buf_ioapply() indicated that buffers were being written without verifiers attached to them from log recovery, and Jan Kara isolated the cause to log recovery readahead an dit's interactions with buffers that had a more recent LSN on disk than the transaction being recovered. In this case, the buffer did not get a verifier attached, and os when the second phase of log recovery ran and recovered EFIs and unlinked inodes, the buffers were modified and written without the verifier running. Hence they had up to date contents, but stale LSNs and CRCs. Fix it by attaching verifiers to buffers we skip due to future LSN values so they don't escape into the buffer cache without the correct verifier attached. This patch is based on analysis and a patch from Jan Kara. cc: <stable@vger.kernel.org> Reported-by: Jan Kara <jack@suse.cz> Reported-by: Fanael Linithien <fanael4@gmail.com> Reported-by: Grozdan <neutrino8@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: catch buffers written without verifiers attachedDave Chinner
We recently had a bug where buffers were slipping through log recovery without any verifier attached to them. This was resulting in on-disk CRC mismatches for valid data. Add some warning code to catch this occurrence so that we catch such bugs during development rather than not being aware they exist. Note that we cannot do this verification unconditionally as non-CRC filesystems don't always attach verifiers to the buffers being written. e.g. during log recovery we cannot identify all the different types of buffers correctly on non-CRC filesystems, so we can't attach the correct verifiers in all cases and so we don't attach any. Hence we don't want on non-CRC filesystems to avoid spamming the logs with false indications. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: avoid false quotacheck after unclean shutdownEric Sandeen
The commit 83e782e xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD added a new function xfs_sb_quota_from_disk() which swaps on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_* flags after the superblock is read. However, if log recovery is required, the superblock is read again, and the modified in-core flags are re-read from disk, so we have XFS_OQUOTA_* flags in memory again. This causes the XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD. Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always convert the disk flags to in-memory flags. Add a lower-level function which can be called with "false" to not convert the flags, so that the sb verifier can verify exactly what was on disk, per Brian Foster's suggestion. Reported-by: Cyril B. <cbay@excellency.fr> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
2014-08-04xfs: fix rounding error of fiemap length parameterBrian Foster
The offset and length parameters are converted from bytes to basic blocks by xfs_vn_fiemap(). The BTOBB() converter rounds the value up to the nearest basic block. This leads to unexpected behavior when unaligned offsets are provided to FIEMAP. Fix the conversions of byte values to block values to cover the provided offsets. Round down the start offset to the nearest basic block. Calculate the end offset based on the provided values, round up and calculate length based on the start block offset. Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-04xfs: introduce xfs_bulkstat_ag_ichunkJie Liu
Introduce xfs_bulkstat_ag_ichunk() to process inodes in chunk with a pointer to a formatter function that will iget the inode and fill in the appropriate structure. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-08-03NFS: allow lockless access to access_cacheNeilBrown
The access cache is used during RCU-walk path lookups, so it is best to avoid locking if possible as taking a lock kills concurrency. The rbtree is not rcu-safe and cannot easily be made so. Instead we simply check the last (i.e. most recent) entry on the LRU list. If this doesn't match, then we return -ECHILD and retry in lock/refcount mode. This requires freeing the nfs_access_entry struct with rcu, and requires using rcu access primatives when adding entries to the lru, and when examining the last entry. Calling put_rpccred before kfree_rcu looks a bit odd, but as put_rpccred already provides rcu protection, we know that the cred will not actually be freed until the next grace period, so any concurrent access will be safe. This patch provides about 5% performance improvement on a stat-heavy synthetic work load with 4 threads on a 2-core CPU. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCUNeilBrown
It fails with -ECHILD rather than make an RPC call. This allows nfs_lookup_revalidate to call it in RCU-walk mode. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: teach nfs_neg_need_reval to understand LOOKUP_RCUNeilBrown
This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: support RCU_WALK in nfs_permission()NeilBrown
nfs_permission makes two calls which are not always safe in RCU_WALK, rpc_lookup_cred and nfs_do_access. The second can easily be made rcu-safe by aborting with -ECHILD before making the RPC call. The former can be made rcu-safe by calling rpc_lookup_cred_nonblock() instead. As this will almost always succeed, we use it even when RCU_WALK isn't being used as it still saves some spinlocks in a common case. We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock() fails and MAY_NOT_BLOCK isn't set. This optimisation (always trying rpc_lookup_cred_nonblock()) is particularly important when a security module is active. In that case inode_permission() may return -ECHILD from security_inode_permission() even though ->permission() succeeded in RCU_WALK mode. This leads to may_lookup() retrying inode_permission after performing unlazy_walk(). The spinlock that rpc_lookup_cred() takes is often more expensive than anything security_inode_permission() does, so that spinlock becomes the main bottleneck. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: prepare for RCU-walk support but pushing tests later in code.NeilBrown
nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission all need to understand and handle RCU-walk for NFS to gain the benefits of RCU-walk for cached information. Currently these functions all immediately return -ECHILD if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set. This patch pushes those tests later in the code so that we only abort immediately before we enter rcu-unsafe code. As subsequent patches make that rcu-unsafe code rcu-safe, several of these new tests will disappear. With this patch there are several paths through the code which will no longer return -ECHILD during an RCU-walk. However these are mostly error paths or other uninteresting cases. A noteworthy change in nfs_lookup_revalidate is that we don't take (or put) the reference to ->d_parent when LOOKUP_RCU is set. Rather we rcu_dereference ->d_parent, and check that ->d_inode is not NULL. We also check that ->d_parent hasn't changed after all the tests. In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the path that only calls nfs_lookup_revalidate() as that function already performs the required test. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.NeilBrown
nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only uses 'dir' if 'inode == NULL'. So we don't need to find out what 'parent' or 'dir' is until we know that 'inode' is NULL. By moving 'dget_parent' inside the 'if', we can reduce the number of call sites for 'dput(parent)'. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: add checks for returned value of try_module_get()Alexey Khoroshilov
There is a couple of places in client code where returned value of try_module_get() is ignored. As a result there is a small chance to premature unload module because of unbalanced refcounting. The patch adds error handling in that places. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03nfs: clear_request_commit while holding i_lockWeston Andros Adamson
Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03pnfs: add pnfs_put_lseg_asyncWeston Andros Adamson
This is useful when lsegs need to be released while holding locks. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03pnfs: find swapped pages on pnfs commit lists tooWeston Andros Adamson
nfs_page_find_head_request_locked looks through the regular nfs commit lists when the page is swapped out, but doesn't look through the pnfs commit lists. I'm not sure if anyone has hit any issues caused by this. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03nfs: fix comment and add warn_on for PG_INODE_REFWeston Andros Adamson
Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request as PG_INODE_REF should always be set. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03nfs: check wait_on_bit_lock err in page_group_lockWeston Andros Adamson
Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03NFS: nfs4_do_open should add negative results to the dcache.NeilBrown
If you have an NFSv4 mounted directory which does not container 'foo' and: ls -l foo ssh $server touch foo cat foo then the 'cat' will fail (usually, depending a bit on the various cache ages). This is correct as negative looks are cached by default. However with the same initial conditions: cat foo ssh $server touch foo cat foo will usually succeed. This is because an "open" does not add a negative dentry to the dcache, while a "lookup" does. This can have negative performance effects. When "gcc" searches for an include file, it will try to "open" the file in every director in the search path. Without caching of negative "open" results, this generates much more traffic to the server than it should (or than NFSv3 does). The root of the problem is that _nfs4_open_and_get_state() will call d_add_unique() on a positive result, but not on a negative result. Compare with nfs_lookup() which calls d_materialise_unique on both a positive result and on ENOENT. This patch adds a call d_add() in the ENOENT case for _nfs4_open_and_get_state() and also calls nfs_set_verifier(). With it, many fewer "open" requests for known-non-existent files are sent to the server. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03nfs3_list_one_acl(): check get_acl() result with IS_ERR_OR_NULLAndrey Utkin
There was a check for result being not NULL. But get_acl() may return NULL, or ERR_PTR, or actual pointer. The purpose of the function where current change is done is to "list ACLs only when they are available", so any error condition of get_acl() mustn't be elevated, and returning 0 there is still valid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111 Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 74adf83f5d77 (nfs: only show Posix ACLs in listxattr if actually...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03Merge branch 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma ↵Trond Myklebust
into linux-next * 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits) xprtrdma: Handle additional connection events xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro xprtrdma: Make rpcrdma_ep_disconnect() return void xprtrdma: Schedule reply tasklet once per upcall xprtrdma: Allocate each struct rpcrdma_mw separately xprtrdma: Rename frmr_wr xprtrdma: Disable completions for LOCAL_INV Work Requests xprtrdma: Disable completions for FAST_REG_MR Work Requests xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect xprtrdma: Properly handle exhaustion of the rb_mws list xprtrdma: Chain together all MWs in same buffer pool xprtrdma: Back off rkey when FAST_REG_MR fails xprtrdma: Unclutter struct rpcrdma_mr_seg xprtrdma: Don't invalidate FRMRs if registration fails xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma: Update rkeys after transport reconnect xprtrdma: Limit data payload size for ALLPHYSICAL xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs ...
2014-08-03NFS: Enforce an upper limit on the number of cached access callTrond Myklebust
This may be used to limit the number of cached credentials building up inside the access cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-02[CIFS] Fix incorrect hex vs. decimal in some debug print statementsSteve French
Joe Perches and Hans Wennborg noticed that various places in the kernel were printing decimal numbers with 0x prefix. printk("0x%d") or equivalent This fixes the instances of this in the cifs driver. CC: Hans Wennborg <hans@hanshq.net> CC: Joe Perches <joe@perches.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-08-02f2fs: avoid skipping recover_inline_xattr after recover_inline_dataChao Yu
When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02f2fs: add tracepoint for f2fs_direct_IOChao Yu
This patch adds a tracepoint for f2fs_direct_IO. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02Update cifs versionSteve French
to 2.04 Signed-off-by: Steve French <smfrench@gmail.com>
2014-08-02CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2Pavel Shilovsky
The existing mapping causes unlink() call to return error after delete operation. Changing the mapping to -EACCES makes the client process the call like CIFS protocol does - reset dos attributes with ATTR_READONLY flag masked off and retry the operation. Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>