summaryrefslogtreecommitdiffstats
path: root/net/sunrpc/xprtrdma/rpc_rdma.c
AgeCommit message (Collapse)Author
2014-07-31xprtrdma: Don't invalidate FRMRs if registration failsChuck Lever
If FRMR registration fails, it's likely to transition the QP to the error state. Or, registration may have failed because the QP is _already_ in ERROR. Thus calling rpcrdma_deregister_external() in rpcrdma_create_chunks() is useless in FRMR mode: the LOCAL_INVs just get flushed. It is safe to leave existing registrations: when FRMR registration is tried again, rpcrdma_register_frmr_external() checks if each FRMR is already/still VALID, and knocks it down first if it is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Update rkeys after transport reconnectChuck Lever
Various reports of: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep ffff8800bfd3e848 Ensure that rkeys in already-marshalled RPC/RDMA headers are refreshed after the QP has been replaced by a reconnect. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249 Suggested-by: Selvin Xavier <Selvin.Xavier@Emulex.Com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Disconnect on registration failureChuck Lever
If rpcrdma_register_external() fails during request marshaling, the current RPC request is killed. Instead, this RPC should be retried after reconnecting the transport instance. The most likely reason for registration failure with FRMR is a failed post_send, which would be due to a remote transport disconnect or memory exhaustion. These issues can be recovered by a retry. Problems encountered in the marshaling logic itself will not be corrected by trying again, so these should still kill a request. Now that we've added a clean exit for marshaling errors, take the opportunity to defang some BUG_ON's. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Avoid deadlock when credit window is resetChuck Lever
Update the cwnd while processing the server's reply. Otherwise the next task on the xprt_sending queue is still subject to the old credit window. Currently, no task is awoken if the old congestion window is still exceeded, even if the new window is larger, and a deadlock results. This is an issue during a transport reconnect. Servers don't normally shrink the credit window, but the client does reset it to 1 when reconnecting so the server can safely grow it again. As a minor optimization, remove the hack of grabbing the initial cwnd size (which happens to be RPC_CWNDSCALE) and using that value as the congestion scaling factor. The scaling value is invariant, and we are better off without the multiplication operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Reset connection timeout after successful reconnectChuck Lever
If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Allocate missing pagelistShirley Ma
GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Edward Mossman <emossman@iol.unh.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Simplify rpcrdma_deregister_external() synopsisChuck Lever
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove REGISTER memory registration modeChuck Lever
All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove MEMWINDOWS registration modesChuck Lever
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove BOUNCEBUFFERS memory registration modeChuck Lever
Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process contextChuck Lever
An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: mind the device's max fast register page list depthSteve Wise
Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-03-17SUNRPC: Fix large reads on NFS/RDMAChuck Lever
After commit a11a2bf4, "SUNRPC: Optimise away unnecessary data moves in xdr_align_pages", Thu Aug 2 13:21:43 2012, READs larger than a few hundred bytes via NFS/RDMA no longer work. This commit exposed a long-standing bug in rpcrdma_inline_fixup(). I reproduce this with an rsize=4096 mount using the cthon04 basic tests. Test 5 fails with an EIO error. For my reproducer, kernel log shows: NFS: server cheating in read reply: count 4096 > recvd 0 rpcrdma_inline_fixup() is zeroing the xdr_stream::page_len field, and xdr_align_pages() is now returning that value to the READ XDR decoder function. That field is set up by xdr_inline_pages() by the READ XDR encoder function. As far as I can tell, it is supposed to be left alone after that, as it describes the dimensions of the reply xdr_stream, not the contents of that stream. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68391 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-02-01SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()Trond Myklebust
tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is defined as an __rcu variable. Replace dereferences of tk_xprt with non-rcu dereferences where it is safe to do so. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-23Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates for Linux 3.4 from Trond Myklebust: "New features include: - Add NFS client support for containers. This should enable most of the necessary functionality, including lockd support, and support for rpc.statd, NFSv4 idmapper and RPCSEC_GSS upcalls into the correct network namespace from which the mount system call was issued. - NFSv4 idmapper scalability improvements Base the idmapper cache on the keyring interface to allow concurrent access to idmapper entries. Start the process of migrating users from the single-threaded daemon-based approach to the multi-threaded request-key based approach. - NFSv4.1 implementation id. Allows the NFSv4.1 client and server to mutually identify each other for logging and debugging purposes. - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of having to use the more counterintuitive 'vers=4,minorversion=1'. - SUNRPC tracepoints. Start the process of adding tracepoints in order to improve debugging of the RPC layer. - pNFS object layout support for autologin. Important bugfixes include: - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail to wake up all tasks when applied to priority waitqueues. - Ensure that we handle read delegations correctly, when we try to truncate a file. - A number of fixes for NFSv4 state manager loops (mostly to do with delegation recovery)." * tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits) NFS: fix sb->s_id in nfs debug prints xprtrdma: Remove assumption that each segment is <= PAGE_SIZE xprtrdma: The transport should not bug-check when a dup reply is received pnfs-obj: autologin: Add support for protocol autologin NFS: Remove nfs4_setup_sequence from generic rename code NFS: Remove nfs4_setup_sequence from generic unlink code NFS: Remove nfs4_setup_sequence from generic read code NFS: Remove nfs4_setup_sequence from generic write code NFS: Fix more NFS debug related build warnings SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined nfs: non void functions must return a value SUNRPC: Kill compiler warning when RPC_DEBUG is unset SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG NFS: Use cond_resched_lock() to reduce latencies in the commit scans NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner NFS: ncommit count is being double decremented SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Try using machine credentials for RENEW calls NFSv4.1: Fix a few issues in filelayout_commit_pagelist NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code ...
2012-03-21xprtrdma: The transport should not bug-check when a dup reply is receivedTom Tucker
The client side RDMA transport will bug check if it receives a duplicate reply, instead we should simply drop the duplicate reply. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20sunrpc: remove the second argument of k[un]map_atomic()Cong Wang
Signed-off-by: Cong Wang <amwang@redhat.com>
2011-03-11RPCRDMA: Fix to XDR page base interpretation in marshalling logic.Tom Tucker
The RPCRDMA marshalling logic assumed that xdr->page_base was an offset into the first page of xdr->page_list. It is in fact an offset into the xdr->page_list itself, that is, it selects the first page in the page_list and the offset into that page. The symptom depended in part on the rpc_memreg_strategy, if it was FRMR, or some other one-shot mapping mode, the connection would get torn down on a base and bounds error. When the badly marshalled RPC was retransmitted it would reconnect, get the error, and tear down the connection again in a loop forever. This resulted in a hung-mount. For the other modes, it would result in silent data corruption. This bug is most easily reproduced by writing more data than the filesystem has space for. This fix corrects the page_base assumption and otherwise simplifies the iov mapping logic. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-11rpcrdma: Fix SQ size calculation when memreg is FRMRTom Tucker
This patch updates the computation to include the worst case situation where three FRMR are required to map a single RPC REQ. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11XPRTRDMA: correct an rpc/rdma inline send marshaling errorTom Talpey
Certain client rpc's which contain both lengthy page-contained metadata and a non-empty xdr_tail buffer require careful handling to avoid overlapped memory copying. Rearranging of existing rpcrdma marshaling code avoids it; this fixes an NFSv4 symlink creation error detected with connectathon basic/test8 to multiple servers. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-15Merge branch 'next'Trond Myklebust
2008-10-10RPC/RDMA: reformat a debug printk to keep lines together.Tom Talpey
The send marshaling code split a particular dprintk across two lines, which makes it hard to extract from logfiles. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: return a consistent error, when connect fails.Tom Talpey
The xprt_connect call path does not expect such errors as ECONNREFUSED to be returned from failed transport connection attempts, otherwise it translates them to EIO and signals fatal errors. For example, mount.nfs prints simply "internal error". Translate all such errors to ENOTCONN from RPC/RDMA to match sockets behavior. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.Tom Talpey
The RPC/RDMA protocol allows clients and servers to avoid RDMA operations for data which is purely the result of XDR padding. On the client, automatically insert the necessary padding for such server replies, and optionally don't marshal such chunks. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: suppress retransmit on RPC/RDMA clients.Tom Talpey
An RPC/RDMA client cannot retransmit on an unbroken connection, doing so violates its flow control with the server. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-09-20net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-30SUNRPC: Fix an unnecessary implicit type cast in rpcrdma_count_chunks()Chuck Lever
Nit: rl_nchunks is an unsigned integer, so pass it into rpcrdma_count_chunks() via an unsigned integer argument. This eliminates a harmless mixed sign comparison in rpcrdma_count_chunks() Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Thomas Talpey <Thomas.Talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Prevent mixed sign comparisons in rpcrdma_convert_iovs()Chuck Lever
Keep the type of the buffer position the same during iovec conversion to reduce the likelihood of unexpected results from comparisons and length computations. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Thomas Talpey <Thomas.Talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-28[SUNRPC]: Use htonl() where appropriate.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-11SUNRPC xprtrdma: fix XDR tail buf marshalling for all opsJames Lentini
rpcrdma_convert_iovs is passed an xdr_buf representing either an RPC request or an RPC reply. In the case of a request, several calculations and tests involving pos are unnecessary. In the case of a reply, several calculations and tests involving pos are incorrect (the code tests pos against the reply xdr buf's len field, which is always 0 at the time rpcrdma_convert_iovs is executed). This change removes the incorrect/unnecessary calculations and tests involving pos. This fixes an observed problem when reading certain file sizes over NFS/RDMA. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-30[SUNRPC] rpc_rdma: we need to cast u64 to unsigned long long for printingStephen Rothwell
as some architectures have unsigned long for u64. net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64 Noticed on PowerPC pseries_defconfig build. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-29SUNRPC endianness annotationsAl Viro
rpcrdma stuff lacks endianness annotations for on-the-wire data. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-09RPCRDMA: rpc rdma protocol implementation\"Talpey, Thomas\
This implements the marshaling and unmarshaling of the rpcrdma transport headers. Connection management is also addressed. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09RPCRDMA: rpc rdma transport switch\"Talpey, Thomas\
This implements the configuration and building of the core transport switch implementation of the rpcrdma transport. Stubs are provided for the rpcrdma protocol handling, and the infiniband/iwarp verbs interface. These are provided in following patches. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>