summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-05-30nfsd4: convert 4.1 replay encodingJ. Bruce Fields
Limits on maxresp_sz mean that we only ever need to replay rpc's that are contained entirely in the head. The one exception is very small zero-copy reads. That's an odd corner case as clients wouldn't normally ask those to be cached. in any case, this seems a little more robust. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: allow encoding across page boundariesJ. Bruce Fields
After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: size-checking cleanupJ. Bruce Fields
Better variable name, some comments, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: remove redundant encode buffer size checkingJ. Bruce Fields
Now that all op encoders can handle running out of space, we no longer need to check the remaining size for every operation; only nonidempotent operations need that check, and that can be done by nfsd4_check_resp_size. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: nfsd4_check_resp_size needn't recalculate lengthJ. Bruce Fields
We're keeping the length updated as we go now, so there's no need for the extra calculation here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: reserve space before inlining 0-copy pagesJ. Bruce Fields
Once we've included page-cache pages in the encoding it's difficult to remove them and restart encoding. (xdr_truncate_encode doesn't handle that case.) So, make sure we'll have adequate space to finish the operation first. For now COMPOUND_SLACK_SPACE checks should prevent this case happening, but we want to remove those checks. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: teach encoders to handle reserve_space failuresJ. Bruce Fields
We've tried to prevent running out of space with COMPOUND_SLACK_SPACE and special checking in those operations (getattr) whose result can vary enormously. However: - COMPOUND_SLACK_SPACE may be difficult to maintain as we add more protocol. - BUG_ON or page faulting on failure seems overly fragile. - Especially in the 4.1 case, we prefer not to fail compounds just because the returned result came *close* to session limits. (Though perfect enforcement here may be difficult.) - I'd prefer encoding to be uniform for all encoders instead of having special exceptions for encoders containing, for example, attributes. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: "backfill" using write_bytes_to_xdr_bufJ. Bruce Fields
Normally xdr encoding proceeds in a single pass from start of a buffer to end, but sometimes we have to write a few bytes to an earlier position. Use write_bytes_to_xdr_buf for these cases rather than saving a pointer to write to. We plan to rewrite xdr_reserve_space to handle encoding across page boundaries using a scratch buffer, and don't want to risk writing to a pointer that was contained in a scratch buffer. Also it will no longer be safe to calculate lengths by subtracting two pointers, so use xdr_buf offsets instead. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: use xdr_truncate_encodeJ. Bruce Fields
Now that lengths are reliable, we can use xdr_truncate instead of open-coding it everywhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30dentry_kill() doesn't need the second argument nowAl Viro
it's 1 in the only remaining caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30dealing with the rest of shrink_dentry_list() livelockAl Viro
We have the same problem with ->d_lock order in the inner loop, where we are dropping references to ancestors. Same solution, basically - instead of using dentry_kill() we use lock_parent() (introduced in the previous commit) to get that lock in a safe way, recheck ->d_count (in case if lock_parent() has ended up dropping and retaking ->d_lock and somebody managed to grab a reference during that window), trylock the inode->i_lock and use __dentry_kill() to do the rest. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30shrink_dentry_list(): take parent's ->d_lock earlierAl Viro
The cause of livelocks there is that we are taking ->d_lock on dentry and its parent in the wrong order, forcing us to use trylock on the parent's one. d_walk() takes them in the right order, and unfortunately it's not hard to create a situation when shrink_dentry_list() can't make progress since trylock keeps failing, and shrink_dcache_parent() or check_submounts_and_drop() keeps calling d_walk() disrupting the very shrink_dentry_list() it's waiting for. Solution is straightforward - if that trylock fails, let's unlock the dentry itself and take locks in the right order. We need to stabilize ->d_parent without holding ->d_lock, but that's doable using RCU. And we'd better do that in the very beginning of the loop in shrink_dentry_list(), since the checks on refcount, etc. would need to be redone anyway. That deals with a half of the problem - killing dentries on the shrink list itself. Another one (dropping their parents) is in the next commit. locking parent is interesting - it would be easy to do rcu_read_lock(), lock whatever we think is a parent, lock dentry itself and check if the parent is still the right one. Except that we need to check that *before* locking the dentry, or we are risking taking ->d_lock out of order. Fortunately, once the D1 is locked, we can check if D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent can start or stop pointing to D1 only under D1->d_lock, so taking D1->d_lock is enough. In other words, the right solution is rcu_read_lock/lock what looks like parent right now/check if it's still our parent/rcu_read_unlock/lock the child. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-29Push the file layout driver into a subdirectoryTom Haynes
The object and block layouts already exist in their own subdirectories. This patch completes the set! Note that as a layout denotes nfs4 already, I stripped that prefix out of the file names. Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pNFS: Handle allocation errors correctly in objlayout_alloc_layout_hdr()Trond Myklebust
Return the NULL pointer when the allocation fails. Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr()Trond Myklebust
Return the NULL pointer when the allocation fails. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: <stable@vger.kernel.org> # 3.5.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data()Scott Mayhew
Those flags are obsolete and checking them can incorrectly cause remount operations to fail. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFSv4: Use error handler on failed GETATTR with successful OPENAndy Adamson
Place the call to resend the failed GETATTR under the error handler so that when appropriate, the GETATTR is retried more than once. The server can fail the GETATTR op in the OPEN compound with a recoverable error such as NFS4ERR_DELAY. In the case of an O_EXCL open, the server has created the file, so a retrans of the OPEN call will fail with NFS4ERR_EXIST. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Fix a potential busy wait in nfs_page_group_lockTrond Myklebust
We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since the loop would cause a busy wait if somebody kills the task. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Fix error handling in __nfs_pageio_add_requestTrond Myklebust
Handle the case where nfs_create_request() returns an error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: support page groups in nfs_read_completionWeston Andros Adamson
nfs_read_completion relied on the fact that there was a 1:1 mapping of page to nfs_request, but this has now changed. Regions not covered by a request have already been zeroed elsewhere. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pnfs: filelayout: support non page aligned layoutsWeston Andros Adamson
Use the new pg_test interface to adjust requests to fit in the current stripe / segment. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pnfs: allow non page aligned pnfs layout segmentsWeston Andros Adamson
Remove alignment checks that would revert to MDS and change pg_test to return the max ammount left in the segment (or other pg_test call) up to size of passed request, or 0 if no space is left. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pnfs: support multiple verfs per direct reqWeston Andros Adamson
Support direct requests that span multiple pnfs data servers by comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket. Continue to use dreq->verf if the MDS is used / non-pNFS. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: remove data list from pgio headerWeston Andros Adamson
Since the ability to split pages into subpage requests has been added, nfs_pgio_header->rpc_list only ever has one pgio data. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: use > 1 request to handle bsize < PAGE_SIZEWeston Andros Adamson
Use the newly added support for multiple requests per page for rsize/wsize < PAGE_SIZE, instead of having multiple read / write data structures per pageio header. This allows us to get rid of nfs_pgio_multi. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: chain calls to pg_testWeston Andros Adamson
Now that pg_test can change the size of the request (by returning a non-zero size smaller than the request), pg_test functions that call other pg_test functions must return the minimum of the result - or 0 if any fail. Also clean up the logic of some pg_test functions so that all checks are for contitions where coalescing is not possible. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: allow coalescing of subpage requestsWeston Andros Adamson
Remove check that the request covers a whole page. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pnfs: clean up filelayout_alloc_commit_infoWeston Andros Adamson
Remove unneeded else statement and clean up how commit info dataserver buckets are replaced. Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: page group support in nfs_mark_uptodateWeston Andros Adamson
Change how nfs_mark_uptodate checks to see if writes cover a whole page. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: page group syncing in write pathWeston Andros Adamson
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the write path, this is calling end_page_writeback and removing the head request from an inode. Both of these operations should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: page group syncing in read pathWeston Andros Adamson
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the read path, this is calling unlock_page and SetPageUptodate. Both of these functions should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: add support for multiple nfs reqs per pageWeston Andros Adamson
Add "page groups" - a circular list of nfs requests (struct nfs_page) that all reference the same page. This gives nfs read and write paths the ability to account for sub-page regions independently. This somewhat follows the design of struct buffer_head's sub-page accounting. Only "head" requests are ever added/removed from the inode list in the buffered write path. "head" and "sub" requests are treated the same through the read path and the rest of the write/commit path. Requests are given an extra reference across the life of the list. Page groups are never rejoined after being split. If the read/write request fails and the client falls back to another path (ie revert to MDS in PNFS case), the already split requests are pushed through the recoalescing code again, which may split them further and then coalesce them into properly sized requests on the wire. Fragmentation shouldn't be a problem with the current design, because we flush all requests in page group when a non-contiguous request is added, so the only time resplitting should occur is on a resend of a read or write. This patch lays the groundwork for sub-page splitting, but does not actually do any splitting. For now all page groups have one request as pg_test functions don't yet split pages. There are several related patches that are needed support multiple requests per page group. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: call nfs_can_coalesce_requests for every reqWeston Andros Adamson
Call nfs_can_coalesce_requests for every request, even the first one. This is needed for future patches to give pg_test a way to inform add_request to reduce the size of the request. Now @prev can be null in nfs_can_coalesce_requests and pg_test functions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: modify pg_test interface to return size_tWeston Andros Adamson
This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: remove unused arg from nfs_create_requestWeston Andros Adamson
@inode is passed but not used. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29pnfs: fix race in filelayout commit pathWeston Andros Adamson
Hold the lock while modifying commit info dataserver buckets. The following oops can be reproduced by running iozone for a while against a 2 DS pynfs filelayout server. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference task: ffff880078164480 ti: ffff88006e972000 task.ti: ffff88006e972000 RIP: 0010:[<ffffffffa01936e1>] [<ffffffffa01936e1>] nfs_init_commit+0x22/0x RSP: 0018:ffff88006e973d30 EFLAGS: 00010246 RAX: ffff88006e973e00 RBX: ffff88006e828800 RCX: ffff88006e973e10 RDX: 0000000000000000 RSI: ffff88006e973e00 RDI: dead4ead00000000 RBP: ffff88006e973d38 R08: ffff88006e8289d8 R09: 0000000000000000 R10: ffff88006e8289d8 R11: 0000000000016988 R12: ffff88006e973b98 R13: ffff88007a0a6648 R14: ffff88006e973e10 R15: ffff88006e828800 FS: 00007f2ce396b740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f03278a1000 CR3: 0000000079043000 CR4: 00000000001407f0 Stack: ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0 ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48 ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080 Call Trace: [<ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv [<ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs] [<ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs] [<ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs] [<ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff8114ee60>] vfs_fsync+0x1c/0x1e [<ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs] [<ffffffff81127a43>] filp_close+0x3c/0x72 [<ffffffff81140e12>] __close_fd+0x82/0x9a [<ffffffff81127a9c>] SyS_close+0x23/0x4c [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8 RIP [<ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs] RSP <ffff88006e973d30> ---[ end trace 732fe6419b235e2f ]--- Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Create a common nfs_pageio_ops structAnna Schumaker
At this point the read and write structures look identical, so combine them into something shared by both. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Create a common generic_pg_pgios()Anna Schumaker
What we have here is two functions that look identical. Let's share some more code! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Create a common multiple_pgios() functionAnna Schumaker
Once again, these two functions look identical in the read and write case. Time to combine them together! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29NFS: Create a common initiate_pgio() functionAnna Schumaker
Most of this code is the same for both the read and write paths, so combine everything and use the rw_ops when necessary. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29expand dentry_kill(dentry, 0) in shrink_dentry_list()Al Viro
Result will be massaged to saner shape in the next commits. It is ugly, no questions - the point of that one is to be a provably equivalent transformation (and it might be worth splitting a bit more). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-29split dentry_kill()Al Viro
... into trylocks and everything else. The latter (actual killing) is __dentry_kill(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-28JFS: Check for NULL before calling posix_acl_equiv_mode()William Burrow
Check for NULL before using the acl in the access type switch statement. This seems to be consistent with what is done in the JFFS and ext4 filesystems and with the behaviour of JFS in the 3.13 kernel. The bug seemed to be introduced in commit 2cc6a5a0. The bug results in a kernel Oops, NULL dereference could not be handled when accessing a JFS filesystem. The rdiff-backup process seemed to trigger the bug. See also reported bug #75341: https://bugzilla.kernel.org/show_bug.cgi?id=75341 Signed-off-by: William Burrow <wbkernel@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2014-05-28NFS: Create a generic_pgio functionAnna Schumaker
These functions are almost identical on both the read and write side. FLUSH_COND_STABLE will never be set for the read path, so leaving it in the generic code won't hurt anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common pgio_error functionAnna Schumaker
At this point, the read and write versions of this function look identical so both should use the same function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common rpcsetup function for reads and writesAnna Schumaker
Write adds a little bit of code dealing with flush flags, but since "how" will always be 0 when reading we can share the code. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common rpc_call_ops structAnna Schumaker
The read and write paths set up this struct in exactly the same way, so create a single shared struct. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common nfs_pgio_result_common functionAnna Schumaker
Combining these functions will let me make a single nfs_rw_common_ops struct (see the next patch). Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common pgio_rpc_prepare functionAnna Schumaker
The read and write paths do exactly the same thing for the rpc_prepare rpc_op. This patch combines them together into a single function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common rw_header_alloc and rw_header_free functionAnna Schumaker
I create a new struct nfs_rw_ops to decide the differences between reads and writes. This struct will be set when initializing a new nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new header is allocated. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>