summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2009-08-10Merge branch 'sunrpc_cache-for-2.6.32' into nfs-for-2.6.32Trond Myklebust
2009-08-10Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32Trond Myklebust
2009-08-10nfs: remove superfluous BUG_ON()sBartlomiej Zolnierkiewicz
Subject: [PATCH] nfs: remove superfluous BUG_ON()s Remove duplicated BUG_ON()s from nfs[4]_create_server() (we make the same checks earlier in both functions). This takes care of the following entries from Dan's list: fs/nfs/client.c +1078 nfs_create_server(47) warning: variable derefenced before check 'server->nfs_client' fs/nfs/client.c +1079 nfs_create_server(48) warning: variable derefenced before check 'server->nfs_client->rpc_ops' fs/nfs/client.c +1363 nfs4_create_server(43) warning: variable derefenced before check 'server->nfs_client' fs/nfs/client.c +1364 nfs4_create_server(44) warning: variable derefenced before check 'server->nfs_ Reported-by: Dan Carpenter <error27@gmail.com> Cc: corbet@lwn.net Cc: eteo@redhat.com Cc: Julia Lawall <julia@diku.dk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10NFS: read-modify-write page updatingPeter Staubach
Hi. I have a proposal for possibly resolving this issue. I believe that this situation occurs due to the way that the Linux NFS client handles writes which modify partial pages. The Linux NFS client handles partial page modifications by allocating a page from the page cache, copying the data from the user level into the page, and then keeping track of the offset and length of the modified portions of the page. The page is not marked as up to date because there are portions of the page which do not contain valid file contents. When a read call comes in for a portion of the page, the contents of the page must be read in the from the server. However, since the page may already contain some modified data, that modified data must be written to the server before the file contents can be read back in the from server. And, since the writing and reading can not be done atomically, the data must be written and committed to stable storage on the server for safety purposes. This means either a FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT. This has been discussed at length previously. This algorithm could be described as modify-write-read. It is most efficient when the application only updates pages and does not read them. My proposed solution is to add a heuristic to decide whether to do this modify-write-read algorithm or switch to a read- modify-write algorithm when initially allocating the page in the write system call path. The heuristic uses the modes that the file was opened with, the offset in the page to read from, and the size of the region to read. If the file was opened for reading in addition to writing and the page would not be filled completely with data from the user level, then read in the old contents of the page and mark it as Uptodate before copying in the new data. If the page would be completely filled with data from the user level, then there would be no reason to read in the old contents because they would just be copied over. This would optimize for applications which randomly access and update portions of files. The linkage editor for the C compiler is an example of such a thing. I tested the attached patch by using rpmbuild to build the current Fedora rawhide kernel. The kernel without the patch generated about 269,500 WRITE requests. The modified kernel containing the patch generated about 261,000 WRITE requests. Thus, about 8,500 fewer WRITE requests were generated. I suspect that many of these additional WRITE requests were probably FILE_SYNC requests to WRITE a single page, but I didn't test this theory. The difference between this patch and the previous one was to remove the unneeded PageDirty() test. I then retested to ensure that the resulting system continued to behave as desired. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10NFS: Add a ->migratepage() aop for NFSTrond Myklebust
Make NFS a bit more friendly to NUMA and memory hot removal... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Add an rpc_pipefs front end for the sunrpc cache codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Move procfs-specific stuff out of the generic sunrpc cache codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Allow the cache_detail to specify alternative upcall mechanismsTrond Myklebust
For events that are rare, such as referral DNS lookups, it makes limited sense to have a daemon constantly listening for upcalls on a channel. An alternative in those cases might simply be to run the app that fills the cache using call_usermodehelper_exec() and friends. The following patch allows the cache_detail to specify alternative upcall mechanisms for these particular cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Remove the global temporary write buffer in net/sunrpc/cache.cTrond Myklebust
While we do want to protect against multiple concurrent readers and writers on each upcall/downcall pipe, we don't want to limit concurrent reading and writing to separate caches. This patch therefore replaces the static buffer 'write_buf', which can only be used by one writer at a time, with use of the page cache as the temporary buffer for downcalls. We still fall back to using the the old global buffer if the downcall is larger than PAGE_CACHE_SIZE, since this is apparently needed by the SPKM security context initialisation. It then replaces the use of the global 'queue_io_mutex' with the inode->i_mutex in cache_read() and cache_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Ensure we initialise the cache_detail before creating procfs filesTrond Myklebust
Also ensure that we destroy those files before we destroy the cache_detail. Otherwise, user processes might attempt to write into uninitialised caches. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFSD: Clean up the idmapper warning...Trond Myklebust
What part of 'internal use' is so hard to understand? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: One more clean up for rpc_create_client_dir()Trond Myklebust
In order to allow rpc_pipefs to create directories with different types of subtrees, it is useful to allow the caller to customise the subtree filling process. In order to do so, we separate out the parts which are specific to making an RPC client directory, and put them in a separate helper, then we convert the process of filling the directory contents into a callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: clean up rpc_setup_pipedir()Trond Myklebust
There is still a little wart or two there: Since we've already got a vfsmount, we might as well pass that in to rpc_create_client_dir. Another point is that if we open code __rpc_lookup_path() here, then we can avoid looking up the entire parent directory path over and over again: it doesn't change. Also get rid of rpc_clnt->cl_pathname, since it has no users... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Replace rpc_client->cl_dentry and cl_mnt, with a cl_pathTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up rpc_create_client_dir()Trond Myklebust
Factor out the code that does lookups from the code that actually creates the directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Rename rpc_mkdir to rpc_create_client_dir()Trond Myklebust
This reflects the fact that rpc_mkdir() as it stands today, can only create a RPC client type directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: rpc_pipefs cleanupTrond Myklebust
Move the files[] array closer to rpc_fill_super() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up rpc_populate/depopulateTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up rpc_lookup_createTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up rpc_unlink()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up file creation code in rpc_pipefsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up rpc_pipefs lookup code...Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Allow rpc_pipefs_ops to have null values for upcall and downcallTrond Myklebust
Also ensure that we use the umode_t type when appropriate... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Constify rpc_pipe_ops...Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Add documenting comments in net/sunrpc/timer.cChuck Lever
Clean up: provide documenting comments for the functions in net/sunrpc/timer.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Update xprt address strings after an rpcbind completesChuck Lever
After a bind completes, update the transport instance's address strings so debugging messages display the current port the transport is connected to. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Kill RPC_DISPLAY_ALLChuck Lever
At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL. Currently there are no uses of RPC_DISPLAY_ALL outside the transport modules themselves, so we can safely get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Rename sock_xprt.addr as sock_xprt.srcaddrChuck Lever
Clean up: Give the "addr" and "port" field less ambiguous names. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Eliminate PROC macro from rpcb_clntChuck Lever
Clean up: Replace PROC macro with open coded C99 structure initializers to improve readability. The rpcbind v4 GETVERSADDR procedure is never sent by the current implementation, so it is not copied to the new structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up: Remove unused XDR decoder functions from rpcb_clnt.cChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Introduce new xdr_stream-based decoders to rpcb_clnt.cChuck Lever
Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new implementation also checks that the incoming port number is reasonable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Introduce xdr_stream-based decoders for RPCB_UNSETChuck Lever
Replace the open-coded decode logic for rpcbind UNSET results with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new function is unused for the moment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up: Remove unused XDR encoder functions from rpcb_clnt.cChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Introduce new xdr_stream-based encoders to rpcb_clnt.cChuck Lever
Replace the open-coded encode logic for rpcbind arguments with an xdr_stream-based implementation, similar to what NFSv4 uses, to better protect against buffer overflows. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFSD: Support IPv6 addresses in write_failover_ip()Chuck Lever
In write_failover_ip(), replace the sscanf() with a call to the common sunrpc.ko presentation address parser. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09lockd: Replace nsm_display_address() with rpc_ntop()Chuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09lockd: Replace nlm_clear_port()Chuck Lever
Clean up: Use shared rpc_set_port() function instead of nlm_clear_port(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Replace nfs_set_port() with rpc_set_port()Chuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Replace nfs_parse_ip_address() with rpc_pton()Chuck Lever
Clean up: Use the common routine now provided in sunrpc.ko for parsing mount addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Use rpc_ntop() for constructing transport address stringsChuck Lever
Clean up: In addition to using the new generic rpc_ntop() and rpc_get_port() functions, have the RPC client compute the presentation address buffer sizes dynamically using kstrdup(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Remove duplicate universal address generationChuck Lever
RPC universal address generation is currently done in several places: rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the redundant cases that convert a socket address to a universal address. The nfs4proc.c case takes a pre-formatted presentation address string, not a socket address, so we'll leave that one. Because the new uaddr constructor uses the recently introduced rpc_ntop(), it now supports proper "::" shorthanding for IPv6 addresses. This allows the kernel to register properly formed universal addresses with the local rpcbind service, in _all_ cases. The kernel can now also send properly formed universal addresses in RPCB_GETADDR requests, and support link-local properly when encoding and decoding IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Provide functions for managing universal addressesChuck Lever
Introduce a set of functions in the kernel's RPC implementation for converting between a socket address and either a standard presentation address string or an RPC universal address. The universal address functions will be used to encode and decode RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are part of a previous promise to deliver shared functions that can be used by upper-layer protocols to display and manipulate IP addresses. The kernel's current address printf formatters were designed specifically for kernel to user-space APIs that require a particular string format for socket addresses, thus are somewhat limited for the purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does not support short-handing or scope IDs. Also, these printf formatters are unique per address family, so a separate formatter string is required for printing AF_INET and AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Move XDR data type size macrosChuck Lever
Clean up: To make subsequent patches cleaner, move the XDR data type size macros to the top of the file (similar to nfs4xdr.c) first. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Clean up RPCBIND_MAXUADDRLEN definitionsChuck Lever
Clean up: Replace the single-integer definition of RPCBIND_MAXUADDRLEN with a definition that is based on previously defined address string sizes, and document the way this maximum is calculated. Also provide a separate macro for the size of the port number extension. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Use the authentication flavor list returned by mountdChuck Lever
Commit a14017db added support in the kernel's NFS mount client to decode the authentication flavor list returned by mountd. The NFS client can now use this list to determine whether the authentication flavor requested by the user is actually supported by the server. Note we don't actually negotiate the security flavor if none was specified by the user. Instead, we try to use AUTH_SYS, and fail if the server does not support it. This prevents us from negotiating an inappropriate security flavor (some servers list AUTH_NULL first). If the server does not support AUTH_SYS, the user must provide an appropriate security flavor by specifying the "sec=" mount option. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Fix auth flavor len accountingChuck Lever
Previous logic in the NFS mount parsing code path assumed auth_flavor_len was set to zero for simple authentication flavors (like AUTH_UNIX), and 1 for compound flavors (like AUTH_GSS). At some earlier point (maybe even before the option parsers were merged?) specific checks for auth_flavor_len being zero were removed from the functions that validate the mount option that sets the mount point's authentication flavor. Since we are populating an array for authentication flavors, the auth_flavor_len should always be set to the number of flavors. Let's eliminate some cleverness here, and prepare for new logic that needs to know the number of flavors in the auth_flavors[] array. (auth_flavors[] is an array because at some point we want to allow a list of acceptable authentication flavors to be specified via the sec= mount option. For now it remains a single element array). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Add ability to send MOUNTPROC_UMNT to the kernel's mountd clientChuck Lever
After certain failure modes of an NFS mount, an NFS client should send a MOUNTPROC_UMNT request to remove the just-added mount entry from the server's mount table. While no-one should rely on the accuracy of the server's mount table, sending a UMNT is simply being a good internet neighbor. Since NFS mount processing is handled in the kernel now, we will need a function in the kernel's mountd client that can post a MOUNTRPC_UMNT request, in order to handle these failure modes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFS: Fix up new minorversion= optionChuck Lever
The new minorversion= mount option (commit 3fd5be9e) was merged at the same time as the recent sloppy parser fixes (commit a5a16bae), so minorversion= still uses the old value parsing logic. If the minorversion= option specifies a bogus value, it should fail with "bad value" not "bad option." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09NFSv4: Clean up the nfs.callback_tcpport optionTrond Myklebust
Tighten up the validity checking in param_set_port: check for NULL pointers. Ensure that the option shows up on 'modinfo' output. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: convert some sysctls into module parametersTrond Myklebust
Parameters like the minimum reserved port, and the number of slot entries should really be module parameters rather than sysctls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>