summaryrefslogtreecommitdiffstats
path: root/net/sunrpc/rpc_pipe.c
AgeCommit message (Collapse)Author
2012-03-29Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd changes from Bruce Fields: Highlights: - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features, moving us closer to a complete 4.1 implementation. - Bernd Schubert fixed a long-standing problem with readdir cookies on ext2/3/4. - Jeff Layton performed a long-overdue overhaul of the server reboot recovery code which will allow us to deprecate the current code (a rather unusual user of the vfs), and give us some needed flexibility for further improvements. - Like the client, we now support numeric uid's and gid's in the auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x. Plus miscellaneous bugfixes and cleanup. Thanks to everyone! There are also some delegation fixes waiting on vfs review that I suppose will have to wait for 3.5. With that done I think we'll finally turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly symbolic as it's been on by default in distro's for a while). And the list of 4.1 todo's should be achievable for 3.5 as well: http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues though we may still want a bit more experience with it before turning it on by default. * 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled nfsd4: use auth_unix unconditionally on backchannel nfsd: fix NULL pointer dereference in cld_pipe_downcall nfsd4: memory corruption in numeric_name_to_id() sunrpc: skip portmap calls on sessions backchannel nfsd4: allow numeric idmapping nfsd: don't allow legacy client tracker init for anything but init_net nfsd: add notifier to handle mount/unmount of rpc_pipefs sb nfsd: add the infrastructure to handle the cld upcall nfsd: add a header describing upcall to nfsdcld nfsd: add a per-net-namespace struct for nfsd sunrpc: create nfsd dir in rpc_pipefs nfsd: add nfsd4_client_tracking_ops struct and a way to set it nfsd: convert nfs4_client->cl_cb_flags to a generic flags field NFSD: Fix nfs4_verifier memory alignment NFSD: Fix warnings when NFSD_DEBUG is not defined nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) nfsd: rename 'int access' to 'int may_flags' in nfsd_open() ext4: return 32/64-bit dir name hash according to usage type fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash ...
2012-03-26sunrpc: create nfsd dir in rpc_pipefsJeff Layton
Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid upcall. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-23Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates for Linux 3.4 from Trond Myklebust: "New features include: - Add NFS client support for containers. This should enable most of the necessary functionality, including lockd support, and support for rpc.statd, NFSv4 idmapper and RPCSEC_GSS upcalls into the correct network namespace from which the mount system call was issued. - NFSv4 idmapper scalability improvements Base the idmapper cache on the keyring interface to allow concurrent access to idmapper entries. Start the process of migrating users from the single-threaded daemon-based approach to the multi-threaded request-key based approach. - NFSv4.1 implementation id. Allows the NFSv4.1 client and server to mutually identify each other for logging and debugging purposes. - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of having to use the more counterintuitive 'vers=4,minorversion=1'. - SUNRPC tracepoints. Start the process of adding tracepoints in order to improve debugging of the RPC layer. - pNFS object layout support for autologin. Important bugfixes include: - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail to wake up all tasks when applied to priority waitqueues. - Ensure that we handle read delegations correctly, when we try to truncate a file. - A number of fixes for NFSv4 state manager loops (mostly to do with delegation recovery)." * tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits) NFS: fix sb->s_id in nfs debug prints xprtrdma: Remove assumption that each segment is <= PAGE_SIZE xprtrdma: The transport should not bug-check when a dup reply is received pnfs-obj: autologin: Add support for protocol autologin NFS: Remove nfs4_setup_sequence from generic rename code NFS: Remove nfs4_setup_sequence from generic unlink code NFS: Remove nfs4_setup_sequence from generic read code NFS: Remove nfs4_setup_sequence from generic write code NFS: Fix more NFS debug related build warnings SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined nfs: non void functions must return a value SUNRPC: Kill compiler warning when RPC_DEBUG is unset SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG NFS: Use cond_resched_lock() to reduce latencies in the commit scans NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner NFS: ncommit count is being double decremented SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Try using machine credentials for RENEW calls NFSv4.1: Fix a few issues in filelayout_commit_pagelist NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code ...
2012-03-20switch open-coded instances of d_make_root() to new helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-11SUNRPC: Fix a few sparse warningsTrond Myklebust
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment (different address spaces) - svc_partial_recvfrom now takes a struct kvec, so the variable save_iovbase needs to be an ordinary (void *) Make a bunch of variables in net/sunrpc/xprtsock.c static Fix a couple of "warning: symbol 'foo' was not declared. Should it be static?" reports. Fix a couple of conflicting function declarations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02SUNRPC: Move clnt->cl_server into struct rpc_xprtTrond Myklebust
When the cl_xprt field is updated, the cl_server field will also have to change. Since the contents of cl_server follow the remote endpoint of cl_xprt, just move that field to the rpc_xprt. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ cel: simplify check_gss_callback_principal(), whitespace changes ] [ cel: forward ported to 3.4 ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt fieldTrond Myklebust
A migration event will replace the rpc_xprt used by an rpc_clnt. To ensure this can be done safely, all references to cl_xprt must now use a form of rcu_dereference(). Special care is taken with rpc_peeraddr2str(), which returns a pointer to memory whose lifetime is the same as the rpc_xprt. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ cel: fix lockdep splats and layering violations ] [ cel: forward ported to 3.4 ] [ cel: remove rpc_max_reqs(), add rpc_net_ns() ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27SUNRPC: move waitq from RPC pipe to RPC inodeStanislav Kinsbursky
Currently, wait queue, used for polling of RPC pipe changes from user-space, is a part of RPC pipe. But the pipe data itself can be released on NFS umount prior to dentry-inode pair, connected to it (is case of this pair is open by some process). This is not a problem for almost all pipe users, because all PipeFS file operations checks pipe reference prior to using it. Except evenfd. This thing registers itself with "poll" file operation and thus has a reference to pipe wait queue. This leads to oopses on destroying eventfd after NFS umount (like rpc_idmapd do) since not pipe data left to the point already. The solution is to wait queue from pipe data to internal RPC inode data. This looks more logical, because this wiat queue used only for user-space processes, which already holds inode reference. Note: upcalls have to get pipe->dentry prior to dereferecing wait queue to make sure, that mount point won't disappear from underneath us. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27SUNRPC: check RPC inode's pipe reference before dereferencingStanislav Kinsbursky
There are 2 tightly bound objects: pipe data (created for kernel needs, has reference to dentry, which depends on PipeFS mount/umount) and PipeFS dentry/inode pair (created on mount for user-space needs). They both independently may have or have not a valid reference to each other. This means, that we have to make sure, that pipe->dentry reference is valid on upcalls, and dentry->pipe reference is valid on downcalls. The latter check is absent - my fault. IOW, PipeFS dentry can be opened by some process (rpc.idmapd for example), but it's pipe data can belong to NFS mount, which was unmounted already and thus pipe data was destroyed. To fix this, pipe reference have to be set to NULL on rpc_unlink() and checked on PipeFS file operations instead of pipe->dentry check. Note: PipeFS "poll" file operation will be updated in next patch, because it's logic is more complicated. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: kernel PipeFS mount point creation routines removedStanislav Kinsbursky
This patch removes static rpc_mnt variable and its creation and destruction routines, because they are not used anymore. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31NFS: idmap PipeFS notifier introducedStanislav Kinsbursky
v2: 1) Added "nfs_idmap_init" and "nfs_idmap_quit" definitions for kernels built without CONFIG_NFS_V4 option set. This patch subscribes NFS clients to RPC pipefs notifications. Idmap notifier is registering on NFS module load. This notifier callback is responsible for creation/destruction of PipeFS idmap pipe dentry for NFS4 clients. Since ipdmap pipe is created in rpc client pipefs directory, we have make sure, that this directory has been created already. IOW RPC client notifier callback has been called already. To achive this, PipeFS notifier priorities has been introduced (RPC clients notifier priority is greater than NFS idmap one). But this approach gives another problem: unlink for RPC client directory will be called before NFS idmap pipe unlink on UMOUNT event and will fail, because directory is not empty. The solution, introduced in this patch, is to try to remove client directory once again after idmap pipe was unlinked. This looks like ugly hack, so probably it should be replaced in some more elegant way. Note that no locking required in notifier callback because PipeFS superblock pointer is passed as an argument from it's creation or destruction routine and thus we can be sure about it's validity. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: fix pipe->ops cleanup on pipe dentry unlinkStanislav Kinsbursky
This patch looks late due to GSS AUTH patches sent already. But it fixes a flaw in RPC PipeFS pipes handling. I've added this patch in the series, because this series related to pipes. But it should be a part of previous series named "SUNPRC: cleanup PipeFS for network-namespace-aware users". Pipe dentry can be created and destroyed many times during pipe life cycle. This actually means, that we can't set pipe->ops to NULL in rpc_close_pipes() and use this variable as a flag, indicating, that pipe's dentry is unlinking. To follow this restriction, this patch replaces "pipe->ops = NULL" assignment and checks for NULL with "pipe->dentry = NULL" assignment and checks for NULL respectively. This patch also removes check for non-NULL pipe->ops (or pipe->dentry) in rpc_close_pipes() because it always non-NULL now. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: subscribe RPC clients to pipefs notificationsStanislav Kinsbursky
This patch subscribes RPC clients to RPC pipefs notifications. RPC clients notifier block is registering with pipefs initialization during SUNRPC module init. This notifier callback is responsible for RPC client PipeFS directory and GSS pipes creation. For pipes creation and destruction two additional callbacks were added to struct rpc_authops. Note that no locking required in notifier callback because PipeFS superblock pointer is passed as an argument from it's creation or destruction routine and thus we can be sure about it's validity. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: split SUNPRC PipeFS dentry and private pipe data creationStanislav Kinsbursky
This patch is a final step towards to removing PipeFS inode references from kernel code other than PipeFS itself. It makes all kernel SUNRPC PipeFS users depends on pipe private data, which state depend on their specific operations, etc. This patch completes SUNRPC PipeFS preparations and allows to create pipe private data and PipeFS dentries independently. Next step will be making SUNPRC PipeFS dentries allocated by SUNRPC PipeFS network namespace aware routines. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNPRC: cleanup RPC PipeFS pipes upcall interfaceStanislav Kinsbursky
RPC pipe upcall doesn't requires only private pipe data. Thus RPC inode references in this code can be removed. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: cleanup PipeFS redundant RPC inode usageStanislav Kinsbursky
This patch removes redundant RPC inode references from PipeFS. These places are actually where pipes operations are performed. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: split SUNPRC PipeFS pipe data and inode creationStanislav Kinsbursky
Generally, pipe data is used only for pipes, and thus allocating space for it on every RPC inode allocation is redundant. This patch splits private SUNRPC PipeFS pipe data and inode, makes pipe data allocated only for pipe inodes. This patch is also is a next step towards to to removing PipeFS inode references from kernel code other than PipeFS itself. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: replace inode lock with pipe lock for RPC PipeFS operationsStanislav Kinsbursky
Currenly, inode i_lock is used to provide concurrent access to SUNPRC PipeFS pipes. It looks redundant, since now other use of inode is present in most of these places and thus can be easely replaced, which will allow to remove most of inode references from PipeFS code. This is a first step towards to removing PipeFS inode references from kernel code other than PipeFS itself. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: added debug messages to RPC pipefsStanislav Kinsbursky
This patch adds debug messages for notification events. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: pipefs per-net operations helper introducedStanislav Kinsbursky
During per-net pipes creation and destruction we have to make sure, that pipefs sb exists for the whole creation/destruction cycle. This is done by using special mutex which controls pipefs sb reference on network namespace context. Helper consists of two parts: first of them (rpc_get_dentry_net) searches for dentry with specified name and returns with mutex taken on success. When pipe creation or destructions is completed, caller should release this mutex by rpc_put_dentry_net call. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: put pipefs superblock link on network namespaceStanislav Kinsbursky
We have modules (like, pNFS blocklayout module) which creates pipes on rpc_pipefs. Thus we need per-net operations for them. To make it possible we require appropriate super block. So we have to put sb link on network namespace context. Note, that it's not strongly required to create pipes in per-net operations. IOW, if pipefs wasn't mounted yet, that no sb link reference will present on network namespace and in this case we need just need to pass through pipe creation. Pipe dentry will be created during pipefs mount notification. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: pipefs dentry lookup helper introducedStanislav Kinsbursky
In all places, where pipefs dentries are created, only directory inode is actually required to create new dentry. And all this directories has root pipefs dentry as their parent. So we actually don't need this pipefs mount point at all if some pipefs lookup method will be provided. IOW, all we really need is just superblock and simple lookup method to find root's child dentry with appropriate name. And this patch introduces this method. Note, that no locking implemented in rpc_d_lookup_sb(). So it can be used only in case of assurance, that pipefs superblock still exist. IOW, we can use this method only in pipefs mount-umount notification subscribers callbacks. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: send notification events on pipefs sb creation and destructionStanislav Kinsbursky
They will be used to notify subscribers about pipefs superblock creation and destruction. Subcribers will have to create their dentries on passed superblock on mount event and destroy otherwise. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: hold current network namespace while pipefs superblock is activeStanislav Kinsbursky
We want to be sure that network namespace is still alive while we have pipefs mounted. This will be required later, when RPC pipefs will be mounting only from user-space context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: create RPC pipefs superblock per network namespace contextStanislav Kinsbursky
This is the initial step of RPC pipefs virtualization. It changes nothing to current pipefs behaviour except that mount of pipefs in other than init_net network namespace context will provide only root tree. No other dentries will be visible. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: remove non-exclusive pipe creation from RPC pipefsStanislav Kinsbursky
This patch-set was created in context of clone of git branch: git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git. v2: 1) Rebased of current repo state (i.e. all commits were pulled before apply) I feel it is ready for inclusion if no objections will appear. SUNRPC pipefs non-exclusive pipe creation code looks obsolete. IOW, as I see it, all pipes are creating with unique full path and only once. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-03sunrpc: propagate umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: fix the stupidity with i_dentry in inode destructorsAl Viro
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-10-25Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (26 commits) Check validity of cl_rpcclient in nfs_server_list_show NFS: Get rid of the nfs_rdata_mempool NFS: Don't rely on PageError in nfs_readpage_release_partial NFS: Get rid of unnecessary calls to ClearPageError() in read code NFS: Get rid of nfs_restart_rpc() NFS: Get rid of the unused nfs_write_data->flags field NFS: Get rid of the unused nfs_read_data->flags field NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup() NFS: Use the inode->i_version to cache NFSv4 change attribute information SUNRPC: Remove unnecessary export of rpc_sockaddr2uaddr SUNRPC: Fix rpc_sockaddr2uaddr nfs/super.c: local functions should be static pnfsblock: fix writeback deadlock pnfsblock: fix NULL pointer dereference pnfs: recoalesce when ld read pagelist fails pnfs: recoalesce when ld write pagelist fails pnfs: make _set_lo_fail generic pnfsblock: add missing rpc_put_mount and path_put SUNRPC/NFS: make rpc pipe upcall generic ...
2011-10-18SUNRPC/NFS: make rpc pipe upcall genericPeng Tao
The same function is used by idmap, gss and blocklayout code. Make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-10sunrpc: add MODULE_ALIAS to match the filesystem nameMichal Schmidt
sunrpc implements the rpc_pipefs filesystem type. Add the alias to have the module requested automatically by the kernel when the filesystem is mounted. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-01sunrpc: Reduce switch/case indentJoe Perches
Make the case labels the same indent as the switch. git diff -w shows 80 column line reflowing. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11Merge branch 'nfs-for-2.6.38' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits) NFS fix the setting of exchange id flag NFS: Don't use vm_map_ram() in readdir NFSv4: Ensure continued open and lockowner name uniqueness NFS: Move cl_delegations to the nfs_server struct NFS: Introduce nfs_detach_delegations() NFS: Move cl_state_owners and related fields to the nfs_server struct NFS: Allow walking nfs_client.cl_superblocks list outside client.c pnfs: layout roc code pnfs: update nfs4_callback_recallany to handle layouts pnfs: add CB_LAYOUTRECALL handling pnfs: CB_LAYOUTRECALL xdr code pnfs: change lo refcounting to atomic_t pnfs: check that partial LAYOUTGET return is ignored pnfs: add layout to client list before sending rpc pnfs: serialize LAYOUTGET(openstateid) pnfs: layoutget rpc code cleanup pnfs: change how lsegs are removed from layout list pnfs: change layout state seqlock to a spinlock pnfs: add prefix to struct pnfs_layout_hdr fields pnfs: add prefix to struct pnfs_layout_segment fields ...
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: icache RCU free inodesNick Piggin
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: change d_delete semanticsNick Piggin
Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-04kernel panic when mount NFSv4Trond Myklebust
On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote: > Hi, > > When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic > at NFS client's __rpc_create_common function. > > The panic place is: > rpc_mkpipe > __rpc_lookup_create() <=== find pipefile *idmap* > __rpc_mkpipe() <=== pipefile is *idmap* > __rpc_create_common() > ****** BUG_ON(!d_unhashed(dentry)); ****** *panic* > > It means that the dentry's d_flags have be set DCACHE_UNHASHED, > but it should not be set here. > > Is someone known this bug? or give me some idea? > > A reproduce program is append, but it can't reproduce the bug every time. > the export is: "/nfsroot *(rw,no_root_squash,fsid=0,insecure)" > > And the panic message is append. > > ============================================================================ > #!/bin/sh > > LOOPTOTAL=768 > LOOPCOUNT=0 > ret=0 > > while [ $LOOPCOUNT -ne $LOOPTOTAL ] > do > ((LOOPCOUNT += 1)) > service nfs restart > /usr/sbin/rpc.idmapd > mount -t nfs4 127.0.0.1:/ /mnt|| return 1; > ls -l /var/lib/nfs/rpc_pipefs/nfs/*/ > umount /mnt > echo $LOOPCOUNT > done > > =============================================================================== > Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b > 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20 > de ee f0 c7 44 24 04 56 ea > EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28 > ---[ end trace 8f5606cd08928ed2]--- > Kernel panic - not syncing: Fatal exception > Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1 > Call Trace: > [<c080ad18>] ? panic+0x42/0xed > [<c080e42c>] ? oops_end+0xbc/0xd0 > [<c040b090>] ? do_invalid_op+0x0/0x90 > [<c040b10f>] ? do_invalid_op+0x7f/0x90 > [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc] > [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc] > [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc] > [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc] > [<c080d80b>] ? error_code+0x73/0x78 > [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [<c0532bda>] ? d_lookup+0x2a/0x40 > [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc] > [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs] > [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs] > [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs] > [<c05e76aa>] ? strlcpy+0x3a/0x60 > [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs] > [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs] > [<c04f1400>] ? krealloc+0x40/0x50 > [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs] > [<c04f14ec>] ? kstrdup+0x3c/0x60 > [<c0520739>] ? vfs_kern_mount+0x69/0x170 > [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs] > [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs] > [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs] > [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0 > [<c0520739>] ? vfs_kern_mount+0x69/0x170 > [<c05366d2>] ? get_fs_type+0x32/0xb0 > [<c052089f>] ? do_kern_mount+0x3f/0xe0 > [<c053954f>] ? do_mount+0x2ef/0x740 > [<c0537740>] ? copy_mount_options+0xb0/0x120 > [<c0539a0e>] ? sys_mount+0x6e/0xa0 Hi, Does the following patch fix the problem? Cheers Trond -------------------------- SUNRPC: Fix a BUG in __rpc_create_common From: Trond Myklebust <Trond.Myklebust@netapp.com> Mi Jinlong reports: When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic at NFS client's __rpc_create_common function. The panic place is: rpc_mkpipe __rpc_lookup_create() <=== find pipefile *idmap* __rpc_mkpipe() <=== pipefile is *idmap* __rpc_create_common() ****** BUG_ON(!d_unhashed(dentry)); ****** *panic* The test is wrong: we can find ourselves with a hashed negative dentry here if the idmapper tried to look up the file before we got round to creating it. Just replace the BUG_ON() with a d_drop(dentry). Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-29convert get_sb_single() usersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: do not assign default i_ino in new_inodeChristoph Hellwig
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-24Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing|ed] => interest[ing|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c
2010-10-22Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds
* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: introduce CONFIG_BKL. dabusb: remove the BKL sunrpc: remove the big kernel lock init/main.c: remove BKL notations blktrace: remove the big kernel lock rtmutex-tester: make it build without BKL dvb-core: kill the big kernel lock dvb/bt8xx: kill the big kernel lock tlclk: remove big kernel lock fix rawctl compat ioctls breakage on amd64 and itanic uml: kill big kernel lock parisc: remove big kernel lock cris: autoconvert trivial BKL users alpha: kill big kernel lock isapnp: BKL removal s390/block: kill the big kernel lock hpet: kill BKL, add compat_ioctl
2010-10-19sunrpc: remove the big kernel lockArnd Bergmann
The sunrpc cache_ioctl function does not need the big kernel lock because it uses its own queue_lock already. rpc_pipe_ioctl apparently should be using i_lock like the other operations on the pipe file descriptor do. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-09-23net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_dataJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-09-12SUNRPC: Fix a race in rpc_info_openTrond Myklebust
There is a race between rpc_info_open and rpc_release_client() in that nothing stops a process from opening the file after the clnt->cl_kref goes to zero. Fix this by using atomic_inc_unless_zero()... Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12SUNRPC: Fix race corrupting rpc upcallTrond Myklebust
If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just after rpc_pipe_release calls rpc_purge_list(), but before it calls gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter will free a message without deleting it from the rpci->pipe list. We will be left with a freed object on the rpc->pipe list. Most frequent symptoms are kernel crashes in rpc.gssd system calls on the pipe in question. Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-05-22sunrpc: Pushdown the bkl from ioctlFrederic Weisbecker
Pushdown the bkl to rpc_pipe_ioctl. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Nfs <linux-nfs@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de>
2010-03-22sunrpc: handle allocation errors from __rpc_lookup_create()Dan Carpenter
__rpc_lookup_create() can return ERR_PTR(-ENOMEM). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-03-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-03Don't bother with d_genocide in rpc_pipeAl Viro
kill_litter_super() from ->kill_sb() will take care of the junk
2010-02-14net: Fix first line of kernel-doc for a few functionsBen Hutchings
The function name must be followed by a space, hypen, space, and a short description. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>