Age | Commit message (Collapse) | Author |
|
0c12eaffdf09466f36a9ffe970dda8f4aeb6efc0 "nfsd: don't break lease on
CLAIM_DELEGATE_CUR" was a temporary workaround for a problem fixed
properly in the vfs layer by 778fc546f749c588aa2f6cd50215d2715c374252
"locks: fix tracking of inprogress lease breaks", so we can revert that
change (but keeping some minor cleanup from that commit).
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
..to allow forcing of either compression scheme. This will override
compiled-in defaults. jffs2_compress is reworked a bit, as the lzo/zlib
override shares lots of code w/ the PRIORITY mode.
v2: update show_options accordingly.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
|
|
Currently jffs2 has compile-time constants (and .config options)
controlling whether or not the various compression/decompression
drivers are built in and enabled. This is fine for embedded
systems, but it clashes with distribution kernels. Distro kernels
tend to turn on everything; this causes OpenFirmware to fall
over, as it understands ZLIB-compressed inodes. Booting a kernel
that has LZO compression enabled, writing to the boot partition,
and then rebooting causes OFW to fail to read the kernel from
the filesystem. This is because LZO compression has priority
when writing new data to jffs2, if LZO is enabled.
This patch adds mount option parsing, and a single supported
option ("compr=none"). This adds the flexibility of being
able to specify which compressor overrides on a per-superblock
basis. For now, we can simply disable compression;
additional flexibility coming soon.
v2: kill some printks, and implement show_options as suggested
by Artem Bityutskiy.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
|
|
The following command sequence triggers an oops.
# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt
general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
Call Trace:
[<ffffffff810d2845>] lock_acquire+0x95/0x140
[<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
[<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
[<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
[<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
[<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
[<ffffffff811c40df>] blkdev_put+0x5f/0x190
[<ffffffff8118f18d>] kill_block_super+0x4d/0x80
[<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
[<ffffffff8119003a>] deactivate_super+0x4a/0x70
[<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
[<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
[<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
This is because bdev holds on to disk but disk doesn't pin the
associated queue. If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.
Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.
Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Both LOOKUP and OPEN operations may return NFS4ERR_BADNAME if we send a
an invalid name as a filename argument. As far as the application is
concerned, it just has to know that the file doesn't exist, and so
ENOENT would be the appropriate reply. We should only return EINVAL
if the filename is being used to _create_ a new object on the
remote filesystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
...and also remove the associated nfs_v4_clientops entry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
I intended to do this as part of fixing part of the conflict with
the merge with Linus' tree, but evidently it didn't get included in
the commit.
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
commit ae50c0b5 "pnfs: client stats" added additional information to
the output of /proc/self/mountstats. The new functions introduced are
only used in this file and should be marked static.
If CONFIG_NFS_V4_1 is not defined, empty stub functions are used. If
CONFIG_NFS_V4 is not defined these stub functions are not used at all.
Adding static for the functions results in compile warnings:
fs/nfs/super.c:743: warning: 'show_sessions' defined but not used
fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used
Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two
show_ functions.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We should check if the sector is already initialized before
trying to grab the page from page cache. Otherwise when two
pages of the same block are written back by two threads each
calling from writepage_locked, it can cause deadlock like bellow.
[ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
[ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.972812] kswapd0 D ffff88000c4926c0 0 25 2 0x00000000
[ 1080.972816] ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
[ 1080.972821] ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
[ 1080.972824] ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
[ 1080.972828] Call Trace:
[ 1080.972860] [<ffffffff81013ba7>] ? read_tsc+0x9/0x19
[ 1080.972877] [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b
[ 1080.972899] [<ffffffff81475a1d>] io_schedule+0x63/0x7e
[ 1080.972902] [<ffffffff810e0b31>] sleep_on_page+0xe/0x12
[ 1080.972905] [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f
[ 1080.972916] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
[ 1080.972919] [<ffffffff810e0af6>] __lock_page+0x66/0x68
[ 1080.972928] [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d
[ 1080.972932] [<ffffffff810e0b1f>] lock_page+0x27/0x2b
[ 1080.972934] [<ffffffff810e0bcf>] find_lock_page+0x34/0x57
[ 1080.972937] [<ffffffff810e1738>] find_or_create_page+0x34/0x8a
[ 1080.972947] [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
[ 1080.972951] [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
[ 1080.972995] [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
[ 1080.973033] [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
[ 1080.973089] [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs]
[ 1080.973098] [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs]
[ 1080.973108] [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs]
[ 1080.973119] [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs]
[ 1080.973122] [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a
[ 1080.973133] [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8
[ 1080.973139] [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e
[ 1080.973144] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
[ 1080.973148] [<ffffffff810f0c33>] shrink_zone+0x445/0x588
[ 1080.973152] [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b
[ 1080.973170] [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80
[ 1080.973175] [<ffffffff810f1f78>] kswapd+0x2be/0x2fa
[ 1080.973179] [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b
[ 1080.973183] [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b
[ 1080.973187] [<ffffffff81071f69>] kthread+0xa8/0xb0
[ 1080.973200] [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10
[ 1080.973205] [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a
[ 1080.973210] [<ffffffff814806b0>] ? gs_change+0x13/0x13
[ 1080.973213] no locks held by kswapd0/25.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
bl_add_page_to_bio returns error pointer. bio should be reset to
NULL in failure cases as the out path always calls bl_submit_bio.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
mds.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
mds.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
file layout and block layout both use it to set mark layout io failure
bit. So make it generic.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The same function is used by idmap, gss and blocklayout code. Make it
generic.
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Make the status field explicitly 32 bits. "...it's unlikely that the kernel
and userspace would differ on the size of an int here, but it might be a
good idea to go ahead and make that explicitly 32 bits in case we end up
dealing with more exotic arches at some point in the future."
Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
nfs4_blk_decode_device.
Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
nfs4_blk_get_deviceinfo.
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
nfs_find_and_lock_request will take a reference to the nfs_page and
will then put it if the req is already locked. It's possible though
that the reference will be the last one. That put then can kick off
a whole series of reference puts:
nfs_page
nfs_open_context
dentry
inode
If the inode ends up being deleted, then the VFS will call
truncate_inode_pages. That function will try to take the page lock, but
it was already locked when migrate_page was called. The code
deadlocks.
Fix this by simply refusing the migration request if PagePrivate is
already set, indicating that the page is already associated with an
active read or write request.
We've had a customer test a backported version of this patch and
the preliminary results seem good.
Cc: stable@kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The result from ipv6_addr_scope() always not be a single SCOPE,
so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
at nfs_sockaddr_match_ipaddr6.
This patch fixs the problem, and lets checking address before scope_id.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
commit 420e3646 allowed the kernel to reduce the number of unnecessary
commit calls by skipping the commit when there are a large number of
outstanding pages.
However, the current test in nfs_commit_unstable_pages does not handle
the edge condition properly. When ncommit == 0, then that means that the
kernel doesn't need to do anything more for the inode. The current test
though in the WB_SYNC_NONE case will return true, and the inode will end
up being marked dirty. Once that happens the inode will never be clean
until there's a WB_SYNC_ALL flush.
Fix this by immediately returning from nfs_commit_unstable_pages when
ncommit == 0.
Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
has a backported version of 420e3646. The inode cache there was growing
very large. The inode cache was unable to be shrunk since the inodes
were all marked dirty. Calling sync() would essentially "fix" the
problem -- the WB_SYNC_ALL flush would result in the inodes all being
marked clean.
What I'm not clear on is how big a problem this is in mainline kernels
as the writeback code there is very different. Either way, it seems
incorrect to re-mark the inode dirty in this case.
Reported-by: Mike McLean <mikem@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
syncing"
This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2.
The reverted commit was rendered obsolete by a VFS fix: commit
5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in
two steps). We now no longer need to worry about writeback_single_inode()
missing our marking the inode for COMMIT in 'do_writepages()' call.
Reverting this patch, fixes a performance regression in which the inode
would continuously get queued to the dirty list, causing the writeback
code to unnecessarily try to send a COMMIT.
Signed-off-by: Trond Myklebust <Trond.Myklebust>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org [2.6.35+]
|
|
Automounting directories are now invalidated by .d_revalidate()
so to be d_instantiate()d again with the right DCACHE_NEED_AUTOMOUNT
flag
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
Smatch complains that the cast to "int" in min_t() changes very large
values of current_read_size into negative values and so min_t()
could return the wrong value. I removed the const as well, as that
doesn't do anything here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
The third parameter to ext4_free_blocks is a struct buffer_head *. This
parameter should be NULL not 0.
This quiets the sparse noise:
warning: Using plain integer as NULL pointer
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This quiets the sparse noise:
warning: incorrect type in argument 2 (different address spaces)
expected void const [noderef] <asn:1>*from
got struct fstrim_range *<noident>
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:1>*to
got struct fstrim_range *<noident>
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The function declarations in ext4.h are already marked extern, so it's
not necessary to do so in the .c files.
This quiets the sparse noise:
warning: function 'ext4_flush_completed_IO' with external linkage has definition
warning: function 'ext4_init_inode_table' with external linkage has definition
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Add block plug for ext4 .writepages. Though ext4 .writepages
already handles request merge very well, block plug is still
helpful to reduce block lock contention.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
As part of startup, the MMP initialization code does this:
mmp->mmp_seq = seq = cpu_to_le32(mmp_new_seq());
Next, mmp->mmp_seq is written out to disk, a delay happens, and then
the MMP block is read back in and the sequence value is tested:
if (seq != le32_to_cpu(mmp->mmp_seq)) {
/* fail the mount */
On a LE system such as x86, the *le32* functions do nothing and this
works. Unfortunately, on a BE system such as ppc64, this comparison
becomes:
if (cpu_to_le32(new_seq) != le32_to_cpu(cpu_to_le32(new_seq)) {
/* fail the mount */
Except for a few palindromic sequence numbers, this test always causes
the mount to fail, which makes MMP filesystems generally unmountable
on ppc64. The attached patch fixes this situation.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Current logic would print an error message only once, and then
'failed_writes' would stay at 1. Rework the loop to increment
'failed_writes' and print the error message every
s_mmp_update_interval * 60 seconds, as intended according to the
comment.
Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Andreas Dilger <adilger@dilger.ca>
|
|
sysname holds "Linux" by default, i.e. what appears when doing a "uname
-s"; nodename should be used to print the machine's hostname, i.e. what
is returned when doing a "uname -n" or "hostname", and what
gethostname(2)/sethostname(2) manipulate, in order to notify the
administrator of the node which is contending to mount the filesystem.
Acked-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.
I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
As with the nfs4_file, we'd prefer to find out about any failure before
creating a new file rather than after.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Move idr preallocation out of stateid initialization, into stateid
allocation, so that we no longer have to handle any errors from the
former.
This is a little subtle due to the way the idr code manages these
preallocated items--document that in comments.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Creating a new file is an irrevocable step--once it's visible in the
filesystem, other processes may have seen it and done something with it,
and unlinking it wouldn't simply undo the effects of the create.
Therefore, in the case where OPEN creates a new file, we shouldn't do
the create until we know that the rest of the OPEN processing will
succeed.
For example, we should preallocate a struct file in case we need it
until waiting to allocate it till process_open2(), which is already too
late.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around. It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.
Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.
Fix both problems.
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There doesn't seem to be any harm to renewing the client a bit earlier,
when it is looked up. That saves us from having to sprinkle
renew_client calls over quite so many places.
Also remove a redundant comment and do a little cleanup.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Resolved conflicts:
fs/xfs/xfs_trans_priv.h:
- deleted struct xfs_ail field xa_flags
- kept field xa_log_flush in struct xfs_ail
fs/xfs/xfs_trans_ail.c:
- in xfsaild_push(), in XFS_ITEM_PUSHBUF case, replaced
"flush_log = 1" with "ailp->xa_log_flush++"
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Add a sanity check to make sure ix hasn't gone beyond the valid bounds
of the extent block.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Now build security descriptor to change either owner or group at the
server. Initially security descriptor was built to change only
(D)ACL, that functionality has been extended.
When either an Owner or a Group of a file object at the server is changed,
rest of security descriptor remains same (DACL etc.).
To set security descriptor, it is necessary to open that file
with permission bits of either WRITE_DAC if DACL is being modified or
WRITE_OWNER (Take Ownership) if Owner or Group is being changed.
It is the server that decides whether a set security descriptor with
either owner or group change succeeds or not.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
This should be a bitwise negate here. It silences a Sparse warning:
fs/nfsd/nfs4xdr.c:693:16: warning: dubious: x & !y
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Current ore_check_io API receives a residual
pointer, to report partial IO. But it is actually
not used, because in a multiple devices IO there
is never a linearity in the IO failure.
On the other hand if every failing device is reported
through a received callback measures can be taken to
handle only failed devices. One at a time.
This will also be needed by the objects-layout-driver
for it's error reporting facility.
Exofs is not currently using the new information and
keeps the old behaviour of failing the complete IO in
case of an error. (No partial completion)
TODO: Use an ore_check_io callback to set_page_error only
the failing pages. And re-dirty write pages.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
All users of the ore will need to check if current code
supports the given layout. For example RAID5/6 is not
currently supported.
So move all the checks from exofs/super.c to a new
ore_verify_layout() to be used by ore users.
Note that any new layout should be passed through the
ore_verify_layout() because the ore engine will prepare
and verify some internal members of ore_layout, and
assumes it's called.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Users like the objlayout-driver would like to only pass
a partial device table that covers the IO in question.
For example exofs divides the file into raid-group-sized
chunks and only serves group_width number of devices at
a time.
The partiality is communicated by setting
ore_componets->first_dev and the array covers all logical
devices from oc->first_dev upto (oc->first_dev + oc->numdevs)
The ore_comp_dev() API receives a logical device index
and returns the actual present device in the table.
An out-of-range dev_index will BUG.
Logical device index is the theoretical device index as if
all the devices of a file are present. .i.e:
total_devs = group_width * mirror_p1 * group_count
0 <= dev_index < total_devs
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Memory conditions and max_bio constraints might cause us to
not comply to the full length of the requested IO. Instead of
failing the complete IO we can issue a shorter read/write and
report how much was actually executed in the ios->length
member.
All users must check ios->length at IO_done or upon return of
ore_read/write and re-issue the reminder of the bytes. Because
other wise there is no error returned like before.
This is part of the effort to support the pnfs-obj layout driver.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
If at read/write_done the actual IO was shorter then requested,
reported in returned ios->length. It is not an error. The reminder
of the pages should just be unlocked but not marked uptodate or
end_page_writeback. They will be re issued later by the VFS.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Move the check and preparation of the ios->kern_buff case to
later inside _write_mirror().
Since read was never used with ios->kern_buff its support is removed
instead of fixed.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|