Age | Commit message (Collapse) | Author |
|
Use truncate_isize_extended() when hole is being created in a file so that
->page_mkwrite() will get called for the partial tail page if it is
mmaped (see the first patch in the series for details).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.
However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called
At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.
This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
XFS currently discards delalloc blocks within the target range of a
zero range request. Unaligned start and end offsets are zeroed
through the page cache and the internal, aligned blocks are
converted to unwritten extents.
If EOF is page aligned and covered by a delayed allocation extent.
The inode size is not updated until I/O completion. If a zero range
request discards a delalloc range that covers page aligned EOF as
such, the inode size update never occurs. For example:
$ rm -f /mnt/file
$ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
$ stat -c "%s" /mnt/file
65536
$ umount /mnt
$ mount <dev> /mnt
$ stat -c "%s" /mnt/file
61440
Update xfs_zero_file_space() to flush the range rather than discard
delalloc blocks to ensure that inode size updates occur
appropriately.
[dchinner: Note that this is really a workaround to avoid the
underlying problems. More work is needed (and ongoing) to fix those
issues so this fix is being added as a temporary stop-gap measure. ]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_vm_writepage() walks each buffer_head on the page, maps to the block
on disk and attaches to a running ioend structure that represents the
I/O submission. A new ioend is created when the type of I/O (unwritten,
delayed allocation or overwrite) required for a particular buffer_head
differs from the previous. If a buffer_head is a delalloc or unwritten
buffer, the associated bits are cleared by xfs_map_at_offset() once the
buffer_head is added to the ioend.
The process of mapping each buffer_head occurs in xfs_map_blocks() and
acquires the ilock in blocking or non-blocking mode, depending on the
type of writeback in progress. If the lock cannot be acquired for
non-blocking writeback, we cancel the ioend, redirty the page and
return. Writeback will revisit the page at some later point.
Note that we acquire the ilock for each buffer on the page. Therefore
during non-blocking writeback, it is possible to add an unwritten buffer
to the ioend, clear the unwritten state, fail to acquire the ilock when
mapping a subsequent buffer and cancel the ioend. If this occurs, the
unwritten status of the buffer sitting in the ioend has been lost. The
page will eventually hit writeback again, but xfs_vm_writepage() submits
overwrite I/O instead of unwritten I/O and does not perform unwritten
extent conversion at I/O completion. This leads to data corruption
because unwritten extents are treated as holes on reads and zeroes are
returned instead of reading from disk.
Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends
of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion
occurs once the page is eventually written back.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Coverity spotted this.
Granted, we *just* checked xfs_inod_dquot() in the caller (by
calling xfs_quota_need_throttle). However, this is the only place we
don't check the return value but the check is cheap and future-proof
so add it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
I discovered this in userspace, but the same change applies
to the kernel.
If we xfs_mdrestore an image from a non-crc filesystem, lo
and behold the restored image has gained a CRC:
# db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img
# xfs_db -c "sb 0" -c "p crc" /dev/sdc1
crc = 0 (correct)
# xfs_db -c "sb 0" -c "p crc" test.img
crc = 0xb6f8d6a0 (correct)
This is because xfs_sb_from_disk doesn't fill in sb_crc,
but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory
CRC to disk - so we get uninitialized memory on disk.
Fix this by always initializing sb_crc to 0 when we read
the superblock, and masking out the CRC bit from ALL_BITS
when we write it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
In this case, if bp is NULL, error is set, and we send a
NULL bp to xfs_trans_brelse, which will try to dereference it.
Test whether we actually have a buffer before we try to
free it.
Coverity spotted this.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
If we write to the maximum file offset (2^63-2), XFS fails to log the
inode size update when the page is flushed. For example:
$ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file
wrote 1/1 bytes at offset 9223372036854775806
1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec)
$ stat -c %s /mnt/file
9223372036854775807
$ umount /mnt ; mount <dev> /mnt/
$ stat -c %s /mnt/file
0
This occurs because XFS calculates the new file size as io_offset +
io_size, I/O occurs in block sized requests, and the maximum supported
file size is not block aligned. Therefore, a write to the max allowable
offset on a 4k blocksize fs results in a write of size 4k to offset
2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the
offset of the block that contains the max file size). The offset plus
size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed
64-bit variable which goes negative and causes the > comparison to the
on-disk inode size to fail. This returns 0 from xfs_new_eof() and
results in no change to the inode on-disk.
Update xfs_new_eof() to explicitly detect overflow of the local
calculation and use the VFS inode size in this scenario. The VFS inode
size is capped to the maximum and thus XFS writes the correct inode size
to disk.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Currently the extent size hint is set unconditionally in
xfs_ioctl_setattr() when the FSX_EXTSIZE flag is set. Hence we can
set hints when the inode flags indicating the hint should be used
are not set. Hence only set the extent size hint from userspace
when the inode has the XFS_DIFLAG_EXTSIZE flag set to indicate that
we should have an extent size hint set on the inode.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_set_diflags() allows it to be set on non-directory inodes, and
this flags errors in xfs_repair. Further, inode allocation allows
the same directory-only flag to be inherited to non-directories.
Make sure directory inode flags don't appear on other types of
inodes.
This fixes several xfstests scratch fileystem corruption reports
(e.g. xfs/050) now that xfstests checks scratch filesystems after
test completion.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The typedef for timespecs and nanotime() are completely unnecessary,
and delay() can be moved to fs/xfs/linux.h, which means this file
can go away.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
struct compat_xfs_bstat is missing the di_forkoff field and so does
not fully translate the structure correctly. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
|
|
xfs_zero_remaining_bytes() open codes a log of buffer manupulations
to do a read forllowed by a write. It can simply be replaced by an
uncached read followed by a xfs_bwrite() call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_buf_read_uncached() has two failure modes. If can either return
NULL or bp->b_error != 0 depending on the type of failure, and not
all callers check for both. Fix it so that xfs_buf_read_uncached()
always returns the error status, and the buffer is returned as a
function parameter. The buffer will only be returned on success.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
There is a lot of cookie-cutter code that looks like:
if (shutdown)
handle buffer error
xfs_buf_iorequest(bp)
error = xfs_buf_iowait(bp)
if (error)
handle buffer error
spread through XFS. There's significant complexity now in
xfs_buf_iorequest() to specifically handle this sort of synchronous
IO pattern, but there's all sorts of nasty surprises in different
error handling code dependent on who owns the buffer references and
the locks.
Pull this pattern into a single helper, where we can hide all the
synchronous IO warts and hence make the error handling for all the
callers much saner. This removes the need for a special extra
reference to protect IO completion processing, as we can now hold a
single reference across dispatch and waiting, simplifying the sync
IO smeantics and error handling.
In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
make it explicitly handle on asynchronous IO. This forces all users
to be switched specifically to one interface or the other and
removes any ambiguity between how the interfaces are to be used. It
also means that xfs_buf_iowait() goes away.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
There is only one caller now - xfs_trans_read_buf_map() - and it has
very well defined call semantics - read, synchronous, and b_iodone
is NULL. Hence it's pretty clear what error handling is necessary
for this case. The bigger problem of untangling
xfs_trans_read_buf_map error handling is left to a future patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Internal buffer write error handling is a mess due to the unnatural
split between xfs_bioerror and xfs_bioerror_relse().
xfs_bwrite() only does sync IO and determines the handler to
call based on b_iodone, so for this caller the only difference
between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE
flag. We don't care what the XBF_DONE flag state is because we stale
the buffer in both paths - the next buffer lookup will clear
XBF_DONE because XBF_STALE is set. Hence we can use common
error handling for xfs_bwrite().
__xfs_buf_delwri_submit() is a similar - it's only ever called
on writes - all sync or async - and again there's no reason to
handle them any differently at all.
Clean up the nasty error handling and remove xfs_bioerror().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Only has two callers, and is just a shutdown check and error handler
around xfs_buf_iorequest. However, the error handling is a mess of
read and write semantics, and both internal callers only call it for
writes. Hence kill the wrapper, and follow up with a patch to
sanitise the error handling.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Currently the report of a bio error from completion
immediately marks the buffer with an error. The issue is that this
is racy w.r.t. synchronous IO - the submitter can see b_error being
set before the IO is complete, and hence we cannot differentiate
between submission failures and completion failures.
Add an internal b_io_error field protected by the b_lock to catch IO
completion errors, and only propagate that to the buffer during
final IO completion handling. Hence we can tell in xfs_buf_iorequest
if we've had a submission failure bey checking bp->b_error before
dropping our b_io_remaining reference - that reference will prevent
b_io_error values from being propagated to b_error in the event that
completion races with submission.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We do some work in xfs_buf_ioend, and some work in
xfs_buf_iodone_work, but much of that functionality is the same.
This work can all be done in a single function, leaving
xfs_buf_iodone just a wrapper to determine if we should execute it
by workqueue or directly. hence rename xfs_buf_iodone_work to
xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
need async processing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When synchronous IO runs IO completion work, it does so without an
IO reference or a hold reference on the buffer. The IO "hold
reference" is owned by the submitter, and released when the
submission is complete. The IO reference is released when both the
submitter and the bio end_io processing is run, and so if the io
completion work is run from IO completion context, it is run without
an IO reference.
Hence we can get the situation where the submitter can submit the
IO, see an error on the buffer and unlock and free the buffer while
there is still IO in progress. This leads to use-after-free and
memory corruption.
Fix this by taking a "sync IO hold" reference that is owned by the
IO and not released until after the buffer completion calls are run
to wake up synchronous waiters. This means that the buffer will not
be freed in any circumstance until all IO processing is completed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
For the special case of delwri buffer submission and waiting, we
don't need to issue IO synchronously at all. The second pass to call
xfs_buf_iowait() can be replaced with blocking on xfs_buf_lock() -
the buffer will be unlocked when the async IO is complete.
This formalises a sane the method of waiting for async IO - take an
extra reference, submit the IO, call xfs_buf_lock() when you want to
wait for IO completion. i.e.:
bp = xfs_buf_find();
xfs_buf_hold(bp);
bp->b_flags |= XBF_ASYNC;
xfs_buf_iosubmit(bp);
xfs_buf_lock(bp)
error = bp->b_error;
....
xfs_buf_relse(bp);
While this is somewhat racy for gathering IO errors, none of the
code that calls xfs_buf_delwri_submit() will race against other
users of the buffers being submitted. Even if they do, we don't
really care if the error is detected by the delwri code or the user
we raced against. Either way, the error will be detected and
handled.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When we have marked the filesystem for shutdown, we want to prevent
any further buffer IO from being submitted. However, we currently
force the log after marking the filesystem as shut down, hence
allowing IO to the log *after* we have marked both the filesystem
and the log as in an error state.
Clean this up by forcing the log before we mark the filesytem with
an error. This replaces the pure CIL flush that we currently have
which works around this same issue (i.e the CIL can't be flushed
once the shutdown flags are set) and hence enables us to clean up
the logic substantially.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.cz>
|
|
Signed type mismatches the ioctl structure, all extent calculations are
done on unsigned types.
Signed-off-by: David Sterba <dsterba@suse.cz>
|
|
We now have cb_to_delegation and to_delegation, which do the same thing
and are defined separately in different .c files. Move the
cb_to_delegation definition into a header file and eliminate the
redundant to_delegation definition.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
|
|
This patch fixes a regression in the patch "GFS2: Remember directory
insert point", commit 2b47dad866d04f14c328f888ba5406057b8c7d33.
The problem had to do with the rename function: The function found
space for the new dirent, and remembered that location. But then the
old dirent was removed, which often moved the eligible location for
the renamed dirent. Putting the new dirent at the saved location
caused file system corruption.
This patch adds a new "save_loc" variable to struct gfs2_diradd.
If 1, the dirent location is saved. If 0, the dirent location is not
saved and the buffer_head is released as per previous behavior.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch cleans up f2fs_ioctl functions for better readability.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch cleans up the existing and new macros for readability.
Rule is like this.
,-----------------------------------------> MAX_BLKADDR -,
| ,------------- TOTAL_BLKS ----------------------------,
| | |
| ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress |
block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) |
address 0..x................ a b c d .............................
| |
global seg# 0...................... m .............................
| | |
| `------- MAIN_SEGS -----------'
`-------------- TOTAL_SEGS ---------------------------'
| |
seg# 0..........xx..................
= Note =
o GET_SEGNO_FROM_SEG0 : blk address -> global segno
o GET_SEGNO : blk address -> segno
o START_BLOCK : segno -> starting block address
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.
This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge NFSv4.2 client SEEK implementation from Anna
* client-4.2: (55 commits)
NFS: Implement SEEK
NFSD: Implement SEEK
NFSD: Add generic v4.2 infrastructure
svcrdma: advertise the correct max payload
nfsd: introduce nfsd4_callback_ops
nfsd: split nfsd4_callback initialization and use
nfsd: introduce a generic nfsd4_cb
nfsd: remove nfsd4_callback.cb_op
nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
nfsd: fix nfsd4_cb_recall_done error handling
nfsd4: clarify how grace period ends
nfsd4: stop grace_time update at end of grace period
nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
nfsd: serialize nfsdcltrack upcalls for a particular client
nfsd: pass extra info in env vars to upcalls to allow for early grace period end
nfsd: add a v4_end_grace file to /proc/fs/nfsd
lockd: add a /proc/fs/lockd/nlm_end_grace file
nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
nfsd: remove redundant boot_time parm from grace_done client tracking op
...
|
|
* bugfixes:
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
NFS: Fabricate fscache server index key correctly
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
nfs: fix duplicate proc entries
|
|
Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.
The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:
In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.
Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
An example application:
open, lock, write a file.
sleep for 6 * lease (could be less)
ulock, close.
In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.
This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...)
Cc: stable@vger.kernel.org # 3.2.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The SEEK operation is used when an application makes an lseek call with
either the SEEK_HOLE or SEEK_DATA flags set. I fall back on
nfs_file_llseek() if the server does not have SEEK support.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
- Pull in patch 'NFSD: Implement SEEK' from Bruce's nfsd-next tree
for dependencies.
|
|
The calculation of page_ptr here is wrong in the case the read doesn't
start at an offset that is a multiple of a page.
The result is that nfs4svc_encode_compoundres sets rq_next_page to a
value one too small, and then the loop in svc_free_res_pages may
incorrectly fail to clear a page pointer in rq_respages[].
Pages left in rq_respages[] are available for the next rpc request to
use, so xdr data may be written to that page, which may hold data still
waiting to be transmitted to the client or data in the page cache.
The observed result was silent data corruption seen on an NFSv4 client.
We tag this as "fixing" 05638dc73af2 because that commit exposed this
bug, though the incorrect calculation predates it.
Particular thanks to Andrea Arcangeli and David Gilbert for analysis and
testing.
Fixes: 05638dc73af2 "nfsd4: simplify server xdr->next_page use"
Cc: stable@vger.kernel.org
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
ocfs2_setattr() actually needs to really use MAXQUOTAS and not
OCFS2_MAXQUOTAS since it will pass the array over to VFS. Currently
this isn't a problem since MAXQUOTAS == OCFS2_MAXQUOTAS but it would
be once we introduce project quotas.
CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
|
|
If a cache object gets killed whilst in the process of being set up - for
instance if the netfs relinquishes the cookie that the object is associated
with - then the object's state machine will transit to the DROP_OBJECT state
without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.
This is a problem for CacheFiles because cachefiles_drop_object() assumes that
object->dentry will be set upon reaching the DROP_OBJECT state and has an
ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
they fail).
To fix this, just make the dentry cleanup in cachefiles_drop_object()
conditional on the dentry actually being set and remove the assertion.
CacheFiles: Assertion failed
------------[ cut here ]------------
kernel BUG at .../fs/cachefiles/namei.c:425!
...
Workqueue: fscache_object fscache_object_work_func [fscache]
...
RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
...
Call Trace:
[<ffffffffa043280f>] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
[<ffffffffa02ac511>] ? fscache_drop_object+0xd1/0x1d0 [fscache]
[<ffffffffa02ac697>] ? fscache_object_work_func+0x87/0x210 [fscache]
[<ffffffff81080635>] ? process_one_work+0x155/0x450
[<ffffffff81081c44>] ? worker_thread+0x114/0x370
[<ffffffff81081b30>] ? manage_workers.isra.21+0x2c0/0x2c0
[<ffffffff81087fcc>] ? kthread+0xbc/0xe0
[<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
[<ffffffff8150638c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
Reported-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
|
|
s/withing/within/
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
|
|
AFAICS, prepend_name() is broken on SMP alpha. Disclaimer: I don't have
SMP alpha boxen to reproduce it on. However, it really looks like the race
is real.
CPU1: d_path() on /mnt/ramfs/<255-character>/foo
CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>
CPU2 does d_alloc(), which allocates an external name, stores the name there
including terminating NUL, does smp_wmb() and stores its address in
dentry->d_name.name. It proceeds to d_add(dentry, NULL) and d_move()
old dentry over to that. ->d_name.name value ends up in that dentry.
In the meanwhile, CPU1 gets to prepend_name() for that dentry. It fetches
->d_name.name and ->d_name.len; the former ends up pointing to new name
(64-byte kmalloc'ed array), the latter - 255 (length of the old name).
Nothing to force the ordering there, and normally that would be OK, since we'd
run into the terminating NUL and stop. Except that it's alpha, and we'd need
a data dependency barrier to guarantee that we see that store of NUL
__d_alloc() has done. In a similar situation dentry_cmp() would survive; it
does explicit smp_read_barrier_depends() after fetching ->d_name.name.
prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
and possibly oops due to taking a page fault in kernel mode.
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This patch adds server support for the NFS v4.2 operation SEEK, which
returns the position of the next hole or data segment in a file.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
It's cleaner to introduce everything at once and have the server reply
with "not supported" than it would be to introduce extra operations when
implementing a specific one in the middle of the list.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
sys_tz is already declared in include/linux/time.h
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
|