summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-07-10nfsd: clean up helper __release_lock_stateidTrond Myklebust
Use filp_close instead of open coding. filp_close does a bit more than just release the locks and put the filp. It also calls ->flush and dnotify_flush, both of which should be done here anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10nfsd: Add locking to the nfs4_file->fi_fds[] arrayTrond Myklebust
Preparation for removal of the client_mutex, which currently protects this array. While we don't actually need the find_*_file_locked variants just yet, a later patch will. So go ahead and add them now to reduce future churn in this code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10nfsd: Add fine grained protection for the nfs4_file->fi_stateids listTrond Myklebust
Access to this list is currently serialized by the client_mutex. Add finer grained locking around this list in preparation for its removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10Merge branch 'for-3.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Mostly fixes for the fallouts from the recent cgroup core changes. The decoupled nature of cgroup dynamic hierarchy management (hierarchies are created dynamically on mount but may or may not be reused once unmounted depending on remaining usages) led to more ugliness being added to kernfs. Hopefully, this is the last of it" * 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: break kernfs active protection in cpuset_write_resmask() cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() kernfs: introduce kernfs_pin_sb() cgroup: fix mount failure in a corner case cpuset,mempolicy: fix sleeping function called from invalid context cgroup: fix broken css_has_online_children()
2014-07-10nfsd: reduce some spinlocking in put_client_renewJeff Layton
No need to take the lock unless the count goes to 0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10nfsd: close potential race between delegation break and laundromatJeff Layton
Bruce says: There's also a preexisting expire_client/laundromat vs break race: - expire_client/laundromat adds a delegation to its local reaplist using the same dl_recall_lru field that a delegation uses to track its position on the recall lru and drops the state lock. - a concurrent break_lease adds the delegation to the lru. - expire/client/laundromat then walks it reaplist and sees the lru head as just another delegation on the list.... Fix this race by checking the dl_time under the state_lock. If we find that it's not 0, then we know that it has already been queued to the LRU list and that we shouldn't queue it again. In the case of destroy_client, we must also ensure that we don't hit similar races by ensuring that we don't move any delegations to the reaplist with a dl_time of 0. Just bump the dl_time by one before we drop the state_lock. We're destroying the delegations anyway, so a 1s difference there won't matter. The fault injection code also requires a bit of surgery here: First, in the case of nfsd_forget_client_delegations, we must prevent the same sort of race vs. the delegation break callback. For that, we just increment the dl_time to ensure that a delegation callback can't race in while we're working on it. We can't do that for nfsd_recall_client_delegations, as we need to have it actually queue the delegation, and that won't happen if we increment the dl_time. The state lock is held over that function, so we don't need to worry about these sorts of races there. There is one other potential bug nfsd_recall_client_delegations though. Entries on the victims list are not dequeued before calling nfsd_break_one_deleg. That's a potential list corruptor, so ensure that we do that there. Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10fuse: restructure ->rename2()Miklos Szeredi
Make ->rename2() universal, i.e. able to handle zero flags. This is to make future change of the API easier. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-07-09NFSD: Fix memory leak in encoding denied lockKinglong Mee
Commit 8c7424cff6 (nfsd4: don't try to encode conflicting owner if low on space) forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()Trond Myklebust
lookup_clientid is preferable to find_confirmed_client since it's able to use the cached client in the compound state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Always use lookup_clientid() in nfsd4_process_open1Trond Myklebust
In later patches, we'll be moving the stateowner table into the nfs4_client, and by doing this we ensure that we have a cached nfs4_client pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Convert nfsd4_process_open1() to work with lookup_clientid()Trond Myklebust
...and have alloc_init_open_stateowner just use the cstate->clp pointer instead of passing in a clp separately. This allows us to use the cached nfs4_client pointer in the cstate instead of having to look it up again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Allow struct nfsd4_compound_state to cache the nfs4_clientJeff Layton
We want to use the nfsd4_compound_state to cache the nfs4_client in order to optimise away extra lookups of the clid. In the v4.0 case, we use this to ensure that we only have to look up the client at most once per compound for each call into lookup_clientid. For v4.1+ we set the pointer in the cstate during SEQUENCE processing so we should never need to do a search for it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: add a nfserrno mapping for -E2BIG to nfserr_fbigJeff Layton
I saw this pop up with some pynfs testing: [ 123.609992] nfsd: non-standard errno: -7 ...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG due to some xattr operations with the ACL10 pynfs TEST (I guess it has limited xattr size?). Add a better mapping for that error since it's possible that we'll need it. How about we convert it to NFSERR_FBIG? As Bruce points out, they both have "BIG" in the name so it must be good. Also, turn the printk in this function into a WARN() so that we can get a bit more information about situations that don't have proper mappings. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: properly convert return from commit_metadata to __be32Jeff Layton
Commit 2a7420c03e504 (nfsd: Ensure that nfsd_create_setattr commits files to stable storage), added a couple of calls to commit_metadata, but doesn't convert their return codes to __be32 in the appropriate places. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argumentTrond Myklebust
The cstate already holds information about the session, and hence the client id, so it makes more sense to pass that information rather than the current practice of passing a 'minor version' number. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Don't get a session reference without a client referenceTrond Myklebust
If the client were to disappear from underneath us while we're holding a session reference, things would be bad. This cleanup helps ensure that it cannot, which will be a possibility when the client_mutex is removed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: clean up nfsd4_release_lockownerJeff Layton
Now that we know that we won't have several lockowners with the same, owner->data, we can simplify nfsd4_release_lockowner and get rid of the lo_list in the process. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: NFSv4 lock-owners are not associated to a specific fileTrond Myklebust
Just like open-owners, lock-owners are associated with a name, a clientid and, in the case of minor version 0, a sequence id. There is no association to a file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Allow lockowners to hold several stateidsJeff Layton
A lockowner can have more than one lock stateid. For instance, if a process has more than one file open and has locks on both, then the same lockowner has more than one stateid associated with it. Change it so that this reality is better reflected by the objects that nfsd uses. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09fs: debugfs: remove trailing whitespaceRahul Bedarkar
fixes checkpatch.pl trailing whitespace errors Signed-off-by: Rahul Bedarkar <rahulbedarkar89@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09kernfs: kernel-doc warning fixFabian Frederick
s/static_name/name_is_static Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09debugfs: Fix corrupted loop in debugfs_remove_recursiveSteven Rostedt
[ I'm currently running my tests on it now, and so far, after a few hours it has yet to blow up. I'll run it for 24 hours which it never succeeded in the past. ] The tracing code has a way to make directories within the debugfs file system as well as deleting them using mkdir/rmdir in the instance directory. This is very limited in functionality, such as there is no renames, and the parent directory "instance" can not be modified. The tracing code creates the instance directory from the debugfs code and then replaces the dentry->d_inode->i_op with its own to allow for mkdir/rmdir to work. When these are called, the d_entry and inode locks need to be released to call the instance creation and deletion code. That code has its own accounting and locking to serialize everything to prevent multiple users from causing harm. As the parent "instance" directory can not be modified this simplifies things. I created a stress test that creates several threads that randomly creates and deletes directories thousands of times a second. The code stood up to this test and I submitted it a while ago. Recently I added a new test that adds readers to the mix. While the instance directories were being added and deleted, readers would read from these directories and even enable tracing within them. This test was able to trigger a bug: general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: ... CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000 RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367 RSP: 0018:ffff880077019df8 EFLAGS: 00010246 RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000 RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454 RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000 R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640 R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7 FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0 Stack: ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640 0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000 00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28 Call Trace: [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3 [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139 [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08 RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367 RSP <ffff880077019df8> It took a while, but every time it triggered, it was always in the same place: list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) { Where the child->d_u.d_child seemed to be corrupted. I added lots of trace_printk()s to see what was wrong, and sure enough, it was always the child's d_u.d_child field. I looked around to see what touches it and noticed that in __dentry_kill() which calls dentry_free(): static void dentry_free(struct dentry *dentry) { /* if dentry was never visible to RCU, immediate free is OK */ if (!(dentry->d_flags & DCACHE_RCUACCESS)) __d_free(&dentry->d_u.d_rcu); else call_rcu(&dentry->d_u.d_rcu, __d_free); } I also noticed that __dentry_kill() unlinks the child->d_u.child under the parent->d_lock spin_lock. Looking back at the loop in debugfs_remove_recursive() it never takes the parent->d_lock to do the list walk. Adding more tracing, I was able to prove this was the issue: ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600 rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058 The dentry_kill freed ffff88006d573600 just as the remove recursive was walking it. In order to fix this, the list walk needs to be modified a bit to take the parent->d_lock. The safe version is no longer necessary, as every time we remove a child, the parent->d_lock must be released and the list walk must start over. Each time a child is removed, even though it may still be on the list, it should be skipped by the first check in the loop: if (!debugfs_positive(child)) continue; Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09f2fs: use inner macro and function to clean up codesChao Yu
In this patch we use below inner macro and function to clean up codes. 1. ADDRS_PER_PAGE 2. SM_I 3. f2fs_readonly Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: introduce f2fs_write_failed to handle error case when writeChao Yu
When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk and page cache are still kept, despite these may not be used again. This patch introduce f2fs_write_failed() to handle the error case of these two interfaces, it will truncate page cache and blocks of this file according to i_size. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: arguments cleanup of finding file flow functionsGu Zheng
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: remove the needless point-castGu Zheng
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: remove the redundant validation check of aclGu Zheng
kernel side(xx_init_acl), the acl is get/cloned from the parent dir's, which is credible. So remove the redundant validation check of acl here. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: reduce region of f2fs_lock_op covered for better concurrencyChao Yu
In our rename process, region of f2fs_lock_op covered is too big as some of the code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock. So in the extreme case like doing checkpoint when we rename old inode to exist inode in a large directory could cause lower concurrency. Let's reduce the region of f2fs_lock_op to fix this. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: replace count*size kzalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: refactor flush_nat_entries codes for reducing NAT writesChao Yu
Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time. In this patch we merge dirty entries located in same NAT block to nat entry set, and linked all set to list, sorted ascending order by entries' count of set. Later we flush entries in sparse set into journal as many as we can, and then flush merged entries to disk. In this way we can not only gain in performance, but also save lifetime of flash device. In my testing environment, it shows this patch can help to reduce NAT block writes obviously. In hard disk test case: cost time of fsstress is stablely reduced by about 5%. 1. virtual machine + hard disk: fsstress -p 20 -n 200 -l 5 node num cp count nodes/cp based 4599.6 1803.0 2.551 patched 2714.6 1829.6 1.483 2. virtual machine + 32g micro SD card: fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0 -f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5 -f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S node num cp count nodes/cp based 84.5 43.7 1.933 patched 49.2 40.0 1.23 Our latency of merging op shows not bad when handling extreme case like: merging a great number of dirty nats: latency(ns) dirty nat count 3089219 24922 5129423 27422 4000250 24523 change log from v1: o fix wrong logic in add_nat_entry when grab a new nat entry set. o swith to create slab cache in create_node_manager_caches. o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency. change log from v2: o make comment position more appropriate suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: clean up an unused parameter and assignmentJaegeuk Kim
This patch cleans up simple unnecessary codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: introduce f2fs_do_tmpfile for code consistencyJaegeuk Kim
This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata flow. Throught this, we can provide the consistent lock usage, e.g., fi->i_sem, and this will enable better debugging stuffs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: support ->tmpfile()Chao Yu
Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic of init_inode_metadata to enable linkat temp file. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: avoid to truncate non-updated page partiallyChao Yu
After we call find_data_page in truncate_partial_data_page, we could not guarantee this page is updated or not as error may occurred in lower layer. We'd better check status of the page to avoid this no updated page be writebacked to device. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: avoid unneeded SetPageUptodate in f2fs_write_endChao Yu
We have already set page update in ->write_begin, so we should remove redundant SetPageUptodate in ->write_end. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09Merge tag 'f2fs-fixes-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bugfixes from Jaegeuk Kim: "This includes a couple of bug fixes found by xfstests. In addition, one critical bug was reported by Brian Chadwick, which is falling into the infinite loop in balance_dirty_pages. And it turned out due to the IO merging policy in f2fs, which was newly merged in 3.16. - fix normal and recovery path for fallocated regions - fix error case mishandling - recover renamed fsync inodes correctly - fix to get out of infinite loops in balance_dirty_pages - fix kernel NULL pointer error" * tag 'f2fs-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid to access NULL pointer in issue_flush_thread f2fs: check bdi->dirty_exceeded when trying to skip data writes f2fs: do checkpoint for the renamed inode f2fs: release new entry page correctly in error path of f2fs_rename f2fs: fix error path in init_inode_metadata f2fs: check lower bound nid value in check_nid_range f2fs: remove unused variables in f2fs_sm_info f2fs: fix not to allocate unnecessary blocks during fallocate f2fs: recover fallocated data and its i_size together f2fs: fix to report newly allocate region as extent
2014-07-09f2fs: avoid to access NULL pointer in issue_flush_threadChao Yu
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861 Denis 2014-05-10 11:28:59 UTC reported: "F2FS-fs (mmcblk0p28): mounting.. Unable to handle kernel NULL pointer dereference at virtual address 00000018 ... [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c) [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4) [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)" This patch assign cmd_control_info in sm_info before issue_flush_thread is being created, so this make sure that issue flush thread will have no chance to access invalid info in fcc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: check bdi->dirty_exceeded when trying to skip data writesJaegeuk Kim
If we don't check the current backing device status, balance_dirty_pages can fall into infinite pausing routine. This can be occurred when a lot of directories make a small number of dirty dentry pages including files. Reported-by: Brian Chadwick <brianchad@westnet.com.au> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: do checkpoint for the renamed inodeJaegeuk Kim
If an inode is renamed, it should be registered as file_lost_pino to conduct checkpoint at f2fs_sync_file. Otherwise, the inode cannot be recovered due to no dent_mark in the following scenario. Note that, this scenario is from xfstests/322. 1. create "a" 2. fsync "a" 3. rename "a" to "b" 4. fsync "b" 5. Sudden power-cut After recovery is done, "b" should be seen. However, the result shows "a", since the recovery procedure does not enter recover_dentry due to no dent_mark. The reason is like below. - The nid of "a" is checkpointed during #2, f2fs_sync_file. - The inode page for "b" produced by #3 is written without dent_mark by sync_node_pages. So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode. If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers the latest pino and its dentry information for further recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: release new entry page correctly in error path of f2fs_renameChao Yu
This patch correct releasing code of new_page to avoid BUG_ON in error patch of f2fs_rename. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: fix error path in init_inode_metadataChao Yu
If we fail in this path: ->init_inode_metadata ->make_empty_dir ->get_new_data_page ->grab_cache_page return -ENOMEM We will bug on in error path of init_inode_metadata when call remove_inode_page because i_block = 2 (one inode block will be released later & one dentry block). We should release the dentry block in init_inode_metadata to avoid this BUG_ON, and avoid leak of dentry block resource, because we never have second chance to release that block in ->evict_inode as in upper error path we make this inode 'bad'. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: check lower bound nid value in check_nid_rangeChao Yu
This patch add lower bound verification for nid in check_nid_range, so nids reserved like 0, node, meta passed by caller could be checked there. And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: remove unused variables in f2fs_sm_infoChao Yu
Remove unused variables in struct f2fs_sm_info. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-08nfsd: lock owners are not per open stateidTrond Myklebust
In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not. This patch replaces the current list of lock owners in the open stateids with a list of lock stateids. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: clean up nfsd4_close_open_stateidTrond Myklebust
Minor cleanup that should introduce no behavioral changes. Currently this function just unhashes the stateid and leaves the caller to do the work of the CLOSE processing. Change nfsd4_close_open_stateid so that it handles doing all of the work of closing a stateid. Move the handling of the unhashed stateid into it instead of doing that work in nfsd4_close. This will help isolate some coming changes to stateid handling from nfsd4_close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: declare v4.1+ openowners confirmed on creationJeff Layton
There's no need to confirm an openowner in v4.1 and above, so we can go ahead and set NFS4_OO_CONFIRMED when we create openowners in those versions. This will also be necessary when we remove the client_mutex, as it'll be possible for two concurrent opens to race in versions >4.0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: Cleanup nfs4svc_encode_compoundresTrond Myklebust
Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c instead of open coding in nfs4svc_encode_compoundres. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: nfs4_preprocess_seqid_op should only set *stpp on successTrond Myklebust
Not technically a bugfix, since nothing tries to use the return pointer if this function doesn't return success, but it could be a problem with some coming changes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: add a new /proc/fs/nfsd/max_connections fileJeff Layton
Currently, the maximum number of connections that nfsd will allow is based on the number of threads spawned. While this is fine for a default, there really isn't a clear relationship between the two. The number of threads corresponds to the number of concurrent requests that we want to allow the server to process at any given time. The connection limit corresponds to the maximum number of clients that we want to allow the server to handle. These are two entirely different quantities. Break the dependency on increasing threads in order to allow for more connections, by adding a new per-net parameter that can be set to a non-zero value. The default is still to base it on the number of threads, so there should be no behavior change for anyone who doesn't use it. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: Ensure that nfsd_create_setattr commits files to stable storageTrond Myklebust
Since nfsd_create_setattr strips the mode from the struct iattr, it is quite possible that it will optimise away the call to nfsd_setattr altogether. If this is the case, then we never call commit_metadata() on the newly created file. Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail when the call to commit_metadata fails. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>