summaryrefslogtreecommitdiffstats
path: root/fs/nfsd/nfs4state.c
AgeCommit message (Collapse)Author
2010-10-05fs/locks.c: prepare for BKL removalArnd Bergmann
This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
2010-09-02nfsd4: mask out non-access bits in nfs4_access_to_omodeJ. Bruce Fields
This fixes an unnecessary BUG(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26Merge commit 'v2.6.36-rc1' into HEADJ. Bruce Fields
2010-08-26nfsd4: fix downgrade/lock logicJ. Bruce Fields
If we already had a RW open for a file, and get a readonly open, we were piggybacking on the existing RW open. That's inconsistent with the downgrade logic which blows away the RW open assuming you'll still have a readonly open. Also, make sure there is a readonly or writeonly open available for locking, again to prevent bad behavior in downgrade cases when any RW open may be lost. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26nfsd4: bad BUG() in preprocess_stateid_opJ. Bruce Fields
It's OK for this function to return without setting filp--we do it in the special-stateid case. And there's a legitimate case where we can hit this, since we do permit reads on write-only stateid's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-07Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd4: fix file open accounting for RDWR opens nfsd: don't allow setting maxblksize after svc created nfsd: initialize nfsd versions before creating svc net: sunrpc: removed duplicated #include nfsd41: Fix a crash when a callback is retried nfsd: fix startup/shutdown order bug nfsd: minor nfsd read api cleanup gcc-4.6: nfsd: fix initialized but not read warnings nfsd4: share file descriptors between stateid's nfsd4: fix openmode checking on IO using lock stateid nfsd4: miscellaneous process_open2 cleanup nfsd4: don't pretend to support write delegations nfsd: bypass readahead cache when have struct file nfsd: minor nfsd_svc() cleanup nfsd: move more into nfsd_startup() nfsd: just keep single lockd reference for nfsd nfsd: clean up nfsd_create_serv error handling nfsd: fix error handling in __write_ports_addxprt nfsd: fix error handling when starting nfsd with rpcbind down nfsd4: fix v4 state shutdown error paths ...
2010-08-07nfsd4: fix file open accounting for RDWR opensJ. Bruce Fields
Commit f9d7562fdb9dc0ada3a7aba5dbbe9d965e2a105d "nfsd4: share file descriptors between stateid's" didn't correctly account for O_RDWR opens. Symptoms include leaked files, resulting in failures to unmount and/or warnings about orphaned inodes on reboot. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29gcc-4.6: nfsd: fix initialized but not read warningsAndi Kleen
Fixes at least one real minor bug: the nfs4 recovery dir sysctl would not return its status properly. Also I finished Al's 1e41568d7378d ("Take ima_path_check() in nfsd past dentry_open() in nfsd_open()") commit, it moved the IMA code, but left the old path initializer in there. The rest is just dead code removed I think, although I was not fully sure about the "is_borc" stuff. Some more review would be still good. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29nfsd4: share file descriptors between stateid'sJ. Bruce Fields
The vfs doesn't really allow us to "upgrade" a file descriptor from read-only to read-write, and our attempt to do so in nfs4_upgrade_open is ugly and incomplete. Move to a different scheme where we keep multiple opens, shared between open stateid's, in the nfs4_file struct. Each file will be opened at most 3 times (for read, write, and read-write), and those opens will be shared between all clients and openers. On upgrade we will do another open if necessary instead of attempting to upgrade an existing open. We keep count of the number of readers and writers so we know when to close the shared files. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-29nfsd4: fix openmode checking on IO using lock stateidJ. Bruce Fields
It is legal to perform a write using the lock stateid that was originally associated with a read lock, or with a file that was originally opened for read, but has since been upgraded. So, when checking the openmode, check the mode associated with the open stateid from which the lock was derived. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29nfsd4: miscellaneous process_open2 cleanupJ. Bruce Fields
Move more work into helper functions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29nfsd4: don't pretend to support write delegationsJ. Bruce Fields
The delegation code mostly pretends to support either read or write delegations. However, correct support for write delegations would require, for example, breaking of delegations (and/or implementation of cb_getattr) on stat. Currently all that stops us from handing out delegations is a subtle reference-counting issue. Avoid confusion by adding an earlier check that explicitly refuses write delegations. For now, though, I'm not going so far as to rip out existing half-support for write delegations, in case we get around to using that soon. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23nfsd4: fix v4 state shutdown error pathsJeff Layton
If someone tries to shut down the laundry_wq while it isn't up it'll cause an oops. This can happen because write_ports can create a nfsd_svc before we really start the nfs server, and we may fail before the server is ever started. Also make sure state is shutdown on error paths in nfsd_svc(). Use a common global nfsd_up flag instead of nfs4_init, and create common helper functions for nfsd start/shutdown, as there will be other work that we want done only when we the number of nfsd threads transitions between zero and nonzero. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-06-22nfsd4: remove some debugging codeJ. Bruce Fields
This is overkill. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22nfsd4: translate memory errors to delay, not serverfaultJ. Bruce Fields
If the server is out of memory is better for clients to back off and retry than to just error out. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22nfsd4; fix session reference count leakJ. Bruce Fields
Note the session has to be put() here regardless of what happens to the client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-08nfsd4: shut down callback queue outside state lockJ. Bruce Fields
This reportedly causes a lockdep warning on nfsd shutdown. That looks like a false positive to me, but there's no reason why this needs the state lock anyway. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-31nfsd4: fix use of op_share_accessJ. Bruce Fields
NFSv4.1 adds additional flags to the share_access argument of the open call. These flags need to be masked out in some of the existing code, but current code does that inconsistently. Tested-by: Michael Groshans <groshans@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18Revert "nfsd4: distinguish expired from stale stateids"J. Bruce Fields
This reverts commit 78155ed75f470710f2aecb3e75e3d97107ba8374. We're depending here on the boot time that we use to generate the stateid being monotonic, but get_seconds() is not necessarily. We still depend at least on boot_time being different every time, but that is a safer bet. We have a few reports of errors that might be explained by this problem, though we haven't been able to confirm any of them. But the minor gain of distinguishing expired from stale errors seems not worth the risk. Conflicts: fs/nfsd/nfs4state.c Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18nfsd: safer initialization order in find_file()Pavel Emelyanov
The alloc_init_file() first adds a file to the hash and then initializes its fi_inode, fi_id and fi_had_conflict. The uninitialized fi_inode could thus be erroneously checked by the find_file(), so move the hash insertion lower. The client_mutex should prevent this race in practice; however, we eventually hope to make less use of the client_mutex, so the ordering here is an accident waiting to happen. I didn't find whether the same can be true for two other fields, but the common sense tells me it's better to initialize an object before putting it into a global hash table :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13nfsd4: implement reclaim_completeJ. Bruce Fields
This is a mandatory operation. Also, here (not in open) is where we should be committing the reboot recovery information. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13nfsd4: nfsd4_destroy_session must set callback client under the state lockBenny Halevy
nfsd4_set_callback_client must be called under the state lock to atomically set or unset the callback client and shutting down the previous one. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13nfsd4: keep a reference count on client while in useBenny Halevy
Get a refcount on the client on SEQUENCE, Release the refcount and renew the client when all respective compounds completed. Do not expire the client by the laundromat while in use. If the client was expired via another path, free it when the compounds complete and the refcount reaches 0. Note that unhash_client_locked must call list_del_init on cl_lru as it may be called twice for the same client (once from nfs4_laundromat and then from expire_client) Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13nfsd4: mark_client_expiredBenny Halevy
Mark the client as expired under the client_lock so it won't be renewed when an nfsv4.1 session is done, after it was explicitly expired during processing of the compound. Do not renew a client mark as expired (in particular, it is not on the lru list anymore) Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13nfsd4: introduce nfs4_client.cl_refcountBenny Halevy
Currently just initialize the cl_refcount to 1 and decrement in expire_client(), conditionally freeing the client when the refcount reaches 0. To be used later by nfsv4.1 compounds to keep the client from timing out while in use. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11nfsd4: refactor expire_clientBenny Halevy
Separate out unhashing of the client and session. To be used later by the laundromat. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11nfsd4: extend the client_lock to cover cl_lruBenny Halevy
To be used later on to hold a reference count on the client while in use by a nfsv4.1 compound. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11nfsd4: use list_move in move_to_confirmedBenny Halevy
rather than list_del_init, list_add Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11nfsd4: fold release_session into expire_clientBenny Halevy
and grab the client lock once for all the client's sessions. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11nfsd4: rename sessionid_lock to client_lockBenny Halevy
In preparation to share the lock's scope to both client and session hash tables. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-07nfsd4: fix bare destroy_session null dereferenceJ. Bruce Fields
It's legal to send a DESTROY_SESSION outside any session (as the only operation in a compound), in which case cstate->session will be NULL; check for that case. While we're at it, move these checks into a separate helper function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-04Merge commit 'v2.6.34-rc6'J. Bruce Fields
Conflicts: fs/nfsd/nfs4callback.c
2010-05-03nfsd4: fix unlikely race in session replay caseJ. Bruce Fields
In the replay case, the renew_client(session->se_client); happens after we've droppped the sessionid_lock, and without holding a reference on the session; so there's nothing preventing the session being freed before we get here. Thanks to Benny Halevy for catching a bug in an earlier version of this patch. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Benny Halevy <bhalevy@panasas.com>
2010-04-22nfsd4: complete enforcement of 4.1 op orderingJ. Bruce Fields
Enforce the rules about compound op ordering. Motivated by implementing RECLAIM_COMPLETE, for which the client is implicit in the current session, so it is important to ensure a succesful SEQUENCE proceeds the RECLAIM_COMPLETE. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22nfsd4: allow 4.0 clients to change callback pathJ. Bruce Fields
The rfc allows a client to change the callback parameters, but we didn't previously implement it. Teach the callbacks to rerun themselves (by placing themselves on a workqueue) when they recognize that their rpc task has been killed and that the callback connection has changed. Then we can change the callback connection by setting up a new rpc client, modifying the nfs4 client to point at it, waiting for any work in progress to complete, and then shutting down the old client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22nfsd4: rearrange cb data structuresJ. Bruce Fields
Mainly I just want to separate the arguments used for setting up the tcp client from the rest. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22nfsd4: cl_count is unusedJ. Bruce Fields
Now that the shutdown sequence guarantees callbacks are shut down before the client is destroyed, we no longer have a use for cl_count. We'll probably reinstate a reference count on the client some day, but it will be held by users other than callbacks. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22nfsd4: don't sleep in lease-break callbackJ. Bruce Fields
The NFSv4 server's fl_break callback can sleep (dropping the BKL), in order to allocate a new rpc task to send a recall to the client. As far as I can tell this doesn't cause any races in the current code, but the analysis is difficult. Also, the sleep here may complicate the move away from the BKL. So, just schedule some work to do the job for us instead. The work will later also prove useful for restarting a call after the callback information is changed. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-16nfsd4: consistent session flag settingJ. Bruce Fields
We should clear these flags on any new create_session, not just on the first one. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-02nfsd4: remove dprintkJ. Bruce Fields
I haven't found this useful. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-02nfsd4: shutdown callbacks on expiryJ. Bruce Fields
Once we've expired the client, there's no further purpose to the callbacks; go ahead and shut down the callback client rather than waiting for the last reference to go. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-09Merge commit 'v2.6.34-rc1' into for-2.6.35-incomingJ. Bruce Fields
2010-03-06nfsd4: allow setting grace period timeJ. Bruce Fields
Allow explicit configuration of the grace period time as well as the lease period time. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-06nfsd4: remove unnecessary lease-setting functionJ. Bruce Fields
This is another layer of indirection that doesn't really buy us anything. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-06nfsd4: simplify lease/grace interactionJ. Bruce Fields
The original code here assumed we'd allow the user to change the lease any time, but only allow the change to take effect on restart. Since then we modified the code to allow setting the lease on when the server is down. Update the rest of the code to reflect that fact, clarify variable names, and add document. Also, the code insisted that the grace period always be the longer of the old and new lease periods, but that's overly conservative--as long as it lasts at least the old lease period, old clients should still know to recover in time. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-06nfsd4: simplify references to nfsd4 lease timeJ. Bruce Fields
Instead of accessing the lease time directly, some users call nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as nfs4_lease_time(). Neither layer of indirection serves any purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-06Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits) nfsd4: fix minor memory leak svcrpc: treat uid's as unsigned nfsd: ensure sockets are closed on error Revert "sunrpc: move the close processing after do recvfrom method" Revert "sunrpc: fix peername failed on closed listener" sunrpc: remove unnecessary svc_xprt_put NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN xfs_export_operations.commit_metadata commit_metadata export operation replacing nfsd_sync_dir lockd: don't clear sm_monitored on nsm_reboot_lookup lockd: release reference to nsm_handle in nlm_host_rebooted nfsd: Use vfs_fsync_range() in nfsd_commit NFSD: Create PF_INET6 listener in write_ports SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found" SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt() NFSD: Support AF_INET6 in svc_addsock() function SUNRPC: Use rpc_pton() in ip_map_parse() nfsd: 4.1 has an rfc number nfsd41: Create the recovery entry for the NFSv4.1 client nfsd: use vfs_fsync for non-directories ...
2010-03-06vfs: take f_lock on modifying f_mode after open timeWu Fengguang
We'll introduce FMODE_RANDOM which will be runtime modified. So protect all runtime modification to f_mode with f_lock to avoid races. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> [2.6.33.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-14nfsd41: Create the recovery entry for the NFSv4.1 clientRicardo Labiaga
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>