asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2011-01-07	fs: rcu-walk aware d_revalidate method	Nick Piggin
	Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache reduce branches in lookup path	Nick Piggin
	Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: icache RCU free inodes	Nick Piggin
	RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache remove dcache_lock	Nick Piggin
	dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: Use rename lock and RCU for multi-step operations	Nick Piggin
	The remaining usages for dcache_lock is to allow atomic, multi-step read-side operations over the directory tree by excluding modifications to the tree. Also, to walk in the leaf->root direction in the tree where we don't have a natural d_lock ordering. This could be accomplished by taking every d_lock, but this would mean a huge number of locks and actually gets very tricky. Solve this instead by using the rename seqlock for multi-step read-side operations, retry in case of a rename so we don't walk up the wrong parent. Concurrent dentry insertions are not serialised against. Concurrent deletes are tricky when walking up the directory: our parent might have been deleted when dropping locks so also need to check and retry for that. We can also use the rename lock in cases where livelock is a worry (and it is introduced in subsequent patch). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: scale inode alias list	Nick Piggin
	Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache scale dentry refcount	Nick Piggin
	Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: change d_delete semantics	Nick Piggin
	Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-06	NFSv4: Ensure continued open and lockowner name uniqueness	Trond Myklebust
	In order to enable migration support, we will want to move some of the structures that are subject to migration into the struct nfs_server. In particular, if we are to move the state_owner and state_owner_id to being a per-filesystem structure, then we should label the resulting open/lock owners with a per-filesytem label to ensure global uniqueness. This patch does so by adding the super block s_dev to the open/lock owner name. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS: Move cl_delegations to the nfs_server struct	Chuck Lever
	Delegations are per-inode, not per-nfs_client. When a server file system is migrated, delegations on the client must be moved from the source to the destination nfs_server. Make it easier to manage a mount point's delegation list across a migration event by moving the list to the nfs_server struct. Clean up: I added documenting comments to public functions I changed in this patch. For consistency I added comments to all the other public functions in fs/nfs/delegation.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS: Introduce nfs_detach_delegations()	Chuck Lever
	Clean up: Refactor code that takes clp->cl_lock and calls nfs_detach_delegations_locked() into its own function. While we're changing the call sites, get rid of the second parameter and the logic in nfs_detach_delegations_locked() that uses it, since callers always set that parameter of nfs_detach_delegations_locked() to NULL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS: Move cl_state_owners and related fields to the nfs_server struct	Chuck Lever
	NFSv4 migration needs to reassociate state owners from the source to the destination nfs_server data structures. To make that easier, move the cl_state_owners field to the nfs_server struct. cl_openowner_id and cl_lockowner_id accompany this move, as they are used in conjunction with cl_state_owners. The cl_lock field in the parent nfs_client continues to protect all three of these fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS: Allow walking nfs_client.cl_superblocks list outside client.c	Chuck Lever
	We're about to move some fields from struct nfs_client to struct nfs_server. There is a many-to-one relationship between nfs_servers and nfs_clients. After these fields are moved to the nfs_server struct, to visit all of the data in these fields that is owned by one nfs_client, code will need to visit each nfs_server on the cl_superblocks list for that nfs_client. To serialize changes to the cl_superblocks list during these little expeditions, protect the list with RCU. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: layout roc code	Fred Isaman
	A layout can request return-on-close. How this interacts with the forgetful model of never sending LAYOUTRETURNS is a bit ambiguous. We forget any layouts marked roc, and wait for them to be completely forgotten before continuing with the close. In addition, to compensate for races with any inflight LAYOUTGETs, and the fact that we do not get any layout stateid back from the server, we set the barrier to the worst case scenario of current_seqid + number of outstanding LAYOUTGETS. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: update nfs4_callback_recallany to handle layouts	Alexandros Batsakis
	While here, update the code a bit. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: add CB_LAYOUTRECALL handling	Fred Isaman
	This is the heart of the wave 2 submission. Add the code to trigger drain and forget of any afected layouts. In addition, we set a "barrier", below which any LAYOUTGET reply is ignored. This is to compensate for the fact that we do not wait for outstanding LAYOUTGETs to complete as per section 12.5.5.2.1 of RFC 5661. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: CB_LAYOUTRECALL xdr code	Fred Isaman
	This is the xdr decoding for CB_LAYOUTRECALL. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: change lo refcounting to atomic_t	Fred Isaman
	This will be required to allow us to grab reference outside of i_lock. While we are at it, make put_layout_hdr take the same argument as all the related functions. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: check that partial LAYOUTGET return is ignored	Fred Isaman
	Either a bad server reply, or our ignoring of multiple array segments in a reply, can cause a reply to not meet our requirements. Ensure that we ignore such replies. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: add layout to client list before sending rpc	Fred Isaman
	Since this list will be used to search for layouts to recall, this is necessary to avoid a race where the recall comes in, sees there is nothing in the client list, and prepares to return NOMATCHING, while the LAYOUTGET gets processed before the recall updates the stateid. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: serialize LAYOUTGET(openstateid)	Fred Isaman
	We shouldn't send a LAYOUTGET(openstateid) unless all outstanding RPCs using the previous stateid are completed. This requires choosing the stateid to encode earlier, so we can abort if one is not available (we want to use the open stateid, but a LAYOUTGET is already out using it), and adding a count of the number of outstanding rpc calls using layout state (which for now consist solely of LAYOUTGETs). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: layoutget rpc code cleanup	Fred Isaman
	No functional changes, just some code minor code rearrangement and comments. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: change how lsegs are removed from layout list	Fred Isaman
	This is to prepare the way for sensible io draining. Instead of just removing the lseg from the list, we instead clear the VALID flag (preventing new io from grabbing references to the lseg) and remove the reference holding it in the list. Thus the lseg will be removed once any io in progress completes and any references still held are dropped. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: change layout state seqlock to a spinlock	Fred Isaman
	This prepares for future changes, where the layout state needs to change atomically with several other variables. In particular, it will need to know if lo->segs is empty, as we test that instead of manipulating the NFS_LAYOUT_STATEID_SET bit. Moreover, the layoutstateid is not really a read-mostly structure, as it is written almost as often as it is read. The behavior of pnfs_get_layout_stateid is also slightly changed, so that it no longer changes the stateid. Its name is changed to +pnfs_choose_layoutget_stateid. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: add prefix to struct pnfs_layout_hdr fields	Fred Isaman
	Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: add prefix to struct pnfs_layout_segment fields	Fred Isaman
	While we are renaming all the fields, change lo->state to lo->plh_flags. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: remove unnecessary field lgp->status	Fred Isaman
	Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	pnfs: fix incorrect comment in destroy_lseg	Fred Isaman
	Comment references get_layout_hdr_locked, which never existed in submitted code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS rename client back channel transport field	Andy Adamson
	Differentiate from server backchannel Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS add session back channel draining	Andy Adamson
	Currently session draining only drains the fore channel. The back channel processing must also be drained. Use the back channel highest_slot_used to indicate that a callback is being processed by the callback thread. Move the session complete to be per channel. When the session is draininig, wait for any current back channel processing to complete and stop all new back channel processing by returning NFS4ERR_DELAY to the back channel client. Drain the back channel, then the fore channel. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS RPC_AUTH_GSS unsupported on v4.1 back channel	Andy Adamson
	Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS refactor nfs_find_client and reference client across callback processing	Andy Adamson
	Fixes a bug where the nfs_client could be freed during callback processing. Refactor nfs_find_client to use minorversion specific means to locate the correct nfs_client structure. In the NFS layer, V4.0 clients are found using the callback_ident field in the CB_COMPOUND header. V4.1 clients are found using the sessionID in the CB_SEQUENCE operation which is also compared against the sessionID associated with the back channel thread after a successful CREATE_SESSION. Each of these methods finds the one an only nfs_client associated with the incoming callback request - so nfs_find_client_next is not needed. In the RPC layer, the pg_authenticate call needs to find the nfs_client. For the v4.0 callback service, the callback identifier has not been decoded so a search by address, version, and minorversion is used. The sessionid for the sessions based callback service has (usually) not been set for the pg_authenticate on a CB_NULL call which can be sent prior to the return of a CREATE_SESSION call, so the sessionid associated with the back channel thread is not used to find the client in pg_authenticate for CB_NULL calls. Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed via the new cb_process_state structure. The reference is held across cb_compound processing. Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP processing from process_op into nfs4_callback_sequence where it belongs. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS associate sessionid with callback connection	Andy Adamson
	The sessions based callback service is started prior to the CREATE_SESSION call so that it can handle CB_NULL requests which can be sent before the CREATE_SESSION call returns and the session ID is known. Set the callback sessionid after a sucessful CREATE_SESSION. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS implement v4.0 callback_ident	Andy Adamson
	Use the small id to pointer translator service to provide a unique callback identifier per SETCLIENTID call used to identify the v4.0 callback service associated with the clientid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS do not clear minor version at nfs_client free	Andy Adamson
	Resetting the client minor version operations causes nfs4_destroy_callback to fail to shutdown the NFSv4.1 callback service. There is no reason to reset the client minorversion operations when the nfs_client struct is being freed. Remove the minorverion reset and rename the function. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06	NFS use svc_create_xprt for NFSv4.1 callback service	Andy Adamson
	The new back channel transport means we call the normal creation routine as well as svc_xprt_put. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04	nfsv4: Switch to generic xattr handling code	Aneesh Kumar K.V
	This patch make nfsv4 use the generic xattr handling code to get the nfsv4 acl. This will help us to add richacl support to nfsv4 in later patches Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04	nfs: Set MS_POSIXACL always	Aneesh Kumar K.V
	We want to skip VFS applying mode for NFS. So set MS_POSIXACL always and selectively use umask. Ideally we would want to use umask only when we don't have inheritable ACEs set. But NFS currently don't allow to send umask to the server. So this is best what we can do and this is consistent with NFSv3 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04	NFS: use ERR_CAST()	Namhyung Kim
	Use ERR_CAST() intead of wierd-looking cast. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04	nfs: fix mispelling of idmap CONFIG symbol	J. Bruce Fields
	Trivial, but confusing when you're trying to grep through this code.... Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04	NFS: Don't leak in nfs_proc_symlink()	Jesper Juhl
	Hi, In fs/nfs/proc.c::nfs_proc_symlink() we will leak memory if either nfs_alloc_fhandle() or nfs_alloc_fattr() returns NULL but the other one doesn't. This patch ensures memory allocated by one when the other fails is always released (this is safe since nfs_free_fattr() and nfs_free_fhandle() both call kfree which deals gracefully with NULL pointers). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21	NFSv4: Convert a few commas into semicolons...	Trond Myklebust
	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21	NFS: suppressing showing of default mount port value in /proc fixed	Stanislav Kinsbursky
	Update: added check for zero value as it was before (note: can't simply check mountd_port for positive value because it's typeof unsigned short) Default value for mount server port is set to NFS_UNSPEC_PORT (-1) and will not be changed during parsing mount options for mound data version 6. This default value will be showed for mountport in /proc/mounts always since current default check is for zero value. This small mistake leads to big problem, because during umount.nfs execution from old user-space utils (at least nfs-utils 1.0.9) this value will be used as the server port to connect to. This request will be rejected (since port is 65535) and thus nfs mount point can't be unmounted. Note from Chuck Lever (chuck.lever@oracle.com): this is only possible if /etc/mtab is a link to /proc/mounts. Not all systems have this configuration. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21	nfs4: fix units bug causing hang on recovery	J. Bruce Fields
	Note that cl_lease_time is in jiffies. This can cause a very long wait in the NFS4ERR_CLID_INUSE case. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21	nfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc()	Jesper Juhl
	Take advantage of kmem_cache_zalloc() in nfs_page_alloc(). Save a call to memset() and a few bytes. Before: [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o text data bss dec hex filename 1765 0 8 1773 6ed fs/nfs/pagelist.o After: [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o text data bss dec hex filename 1749 0 8 1757 6dd fs/nfs/pagelist.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21	NFS: Remove redundant unlikely()	Tobias Klauser
	IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16	SUNRPC: New xdr_streams XDR decoder API	Chuck Lever
	Now that all client-side XDR decoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC res *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each decoder function. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16	SUNRPC: New xdr_streams XDR encoder API	Chuck Lever
	Now that all client-side XDR encoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC arg *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each encoder function. Also, all the client-side encoder functions return 0 now, making a return value superfluous. Take this opportunity to convert them to return void instead. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16	NFS: Remove unused UMNT response data structure	Chuck Lever
	Clean up. The UMNT request has a NULL response. There's no need to set up a mountres structure for it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16	NFS: Avoid return code checking in mount XDR encoder functions	Chuck Lever
	Clean up. The trend in the other XDR encoder functions is to BUG() when encoding problems occur, since a problem here is always due to a local coding error. Then, instead of a status, zero is unconditionally returned. Update the mount client XDR encoders to behave this way. To finish the update, use the new-style be32_to_cpup() and cpu_to_be32() macros, and compute the buffer sizes using raw integers instead of sizeof(). This matches the conventions used in other XDR functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>