Age | Commit message (Collapse) | Author |
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If this loses any backchannel, make sure we have a chance to notice that
and set the sequence flags.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This makes sure we set the sequence flag when necessary.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Implement the SEQ4_STATUS_CB_PATH_DOWN flag.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Distinguish between when the callback channel is known to be down, and
when it is not yet confirmed. This will be useful in the 4.1 case.
Also, we don't seem to be using the fact that this field is atomic.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Now that we have a list of connections to choose from, we can teach the
callback code to just pick a suitable connection and use that, instead
of insisting on forever using the connection that the first
create_session was sent with.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We want to traverse this from the callback code.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
I made a slight mess of Documentation/filesystems/Locking; resolve
conflicts with upstream before fixing it up.
|
|
This reverts commit 4f531501e44: ext4: fix possible overflow in
ext4_trim_fs()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
* 'for-linus-merged' of git://oss.sgi.com/xfs/xfs: (47 commits)
xfs: convert grant head manipulations to lockless algorithm
xfs: introduce new locks for the log grant ticket wait queues
xfs: convert log grant heads to atomic variables
xfs: convert l_tail_lsn to an atomic variable.
xfs: convert l_last_sync_lsn to an atomic variable
xfs: make AIL tail pushing independent of the grant lock
xfs: use wait queues directly for the log wait queues
xfs: combine grant heads into a single 64 bit integer
xfs: rework log grant space calculations
xfs: fact out common grant head/log tail verification code
xfs: convert log grant ticket queues to list heads
xfs: use AIL bulk delete function to implement single delete
xfs: use AIL bulk update function to implement single updates
xfs: remove all the inodes on a buffer from the AIL in bulk
xfs: consume iodone callback items on buffers as they are processed
xfs: reduce the number of AIL push wakeups
xfs: bulk AIL insertion during transaction commit
xfs: clean up xfs_ail_delete()
xfs: Pull EFI/EFD handling out from under the AIL lock
xfs: fix EFI transaction cancellation.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits)
MAINTAINERS: Update Joel Becker's email address
ocfs2: Remove unused truncate function from alloc.c
ocfs2/cluster: dereferencing before checking in nst_seq_show()
ocfs2: fix build for OCFS2_FS_STATS not enabled
ocfs2/cluster: Show o2net timing statistics
ocfs2/cluster: Track process message timing stats for each socket
ocfs2/cluster: Track send message timing stats for each socket
ocfs2/cluster: Use ktime instead of timeval in struct o2net_sock_container
ocfs2/cluster: Replace timeval with ktime in struct o2net_send_tracking
ocfs2: Add DEBUG_FS dependency
ocfs2/dlm: Hard code the values for enums
ocfs2/dlm: Minor cleanup
ocfs2/dlm: Cleanup dlmdebug.c
ocfs2: Release buffer_head in case of error in ocfs2_double_lock.
ocfs2/cluster: Pin the local node when o2hb thread starts
ocfs2/cluster: Show pin state for each o2hb region
ocfs2/cluster: Pin/unpin o2hb regions
ocfs2/cluster: Remove dropped region from o2hb quorum region bitmap
ocfs2/cluster: Pin the remote node item in configfs
ocfs2/dlm: make existing convertion precedent over new lock
...
|
|
Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.
Update the EXCHID4_FLAG_XXX set to RFC 5661.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We do set dentry->d_op in lookup even in case of EOENT entries.
That implies we should have dentry->d_op already set when
create/mkdir/mknod/link/symlink routines are called
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
introduced a typo somehow during a hand merge
Reported by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Remove v9fs_vfs_readlink_dotl function and use generic_readlink. Update
v9fs_vfs_follow_link_dotl function to accommodate this change
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Source Code Reorganization
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Make the 9P_FS kconfig options subordinate to the 9P_FS kconfig symbol
in the menu presentation instead of them all being at the same level.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
If we don't have default ACL, then trying to remove
default acl on a file should return 0.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
|
|
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
This merge pulls the XFS master branch into the latest Linus master.
This results in a merge conflict whose best fix is not obvious.
I manually fixed the conflict, in "fs/xfs/xfs_iget.c".
Dave Chinner had done work that resulted in RCU freeing of inodes
separate from what Nick Piggin had done, and their results differed
slightly in xfs_inode_free(). The fix updates Nick's call_rcu()
with the use of VFS_I(), while incorporating needed updates to some
XFS inode fields implemented in Dave's series. Dave's RCU callback
function has also been removed.
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
driver core: Document that device_rename() is only for networking
sysfs: remove useless test from sysfs_merge_group
driver-core: merge private parts of class and bus
driver core: fix whitespace in class_attr_string
|
|
When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.
To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.
We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The buffered IO and direct IO write paths share a common set of
checks and limiting code prior to issuing the write. Factor that
into a common helper function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Complete the split of the different write IO paths by splitting the
buffered IO write path out of xfs_file_aio_write(). This makes the
different mechanisms of the write patchs easier to follow.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The current xfs_file_aio_write code is a mess of locking shenanigans
to handle the different locking requirements of buffered and direct
IO. Start to clean this up by disentangling the direct IO path from
the mess.
This also removes the failed direct IO fallback path to buffered IO.
XFS handles all direct IO cases without needing to fall back to
buffered IO, so we can safely remove this unused path. This greatly
simplifies the logic and locking needed in the write path.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
We need to obtain the i_mutex, i_iolock and i_ilock during the read
and write paths. Add a set of wrapper functions to neatly
encapsulate the lock ordering and shared/exclusive semantics to make
the locking easier to follow and get right.
Note that this changes some of the exclusive locking serialisation in
that serialisation will occur against the i_mutex instead of the
XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
arguably more efficient to use the mutex for such serialisation than
the rw_sem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
xfs_file_aio_write() only returns the error from synchronous
flushing of the data and inode if error == 0. At the point where
error is being checked, it is guaranteed to be > 0. Therefore any
errors returned by the data or fsync flush will never be returned.
Fix the checks so we overwrite the current error once and only if an
error really occurred.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Conflicts:
fs/nfs/nfs2xdr.c
fs/nfs/nfs3xdr.c
fs/nfs/nfs4xdr.c
|
|
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.
The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org [2.6.37]
|
|
EXT2_XATTR_DEBUG
When I enable EXT2_XATTR_DEBUG in fs/ext2/xattr.c I get a build error stating
the following:
CC fs/ext2/xattr.o
fs/ext2/xattr.c: In function 'ext2_xattr_cache_insert':
fs/ext2/xattr.c:841: error: dereferencing pointer to incomplete type
fs/ext2/xattr.c:846: error: dereferencing pointer to incomplete type
make[2]: *** [fs/ext2/xattr.o] Error 1
make[1]: *** [fs/ext2] Error 2
make: *** [fs] Error 2
These lines reference ext2_xattr_cache->c_entry_count which is defined
in struct mb_cache. struct mb_cache is currently only defined in fs/mbcache.c.
Moving struct mb_cache definition to include/linux/mbcache.h to resolve the
issue.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.
Similar changes already exist in the ext4 codebase.
The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.
bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 77,000/s to about 82,000/s, about a 6% improvement.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.
The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.
bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 2200/s to about 2500/s, about a 14% improvement.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Check return value of ext3_journal_get_write_acccess() and
ext3_journal_dirty_metadata().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Check return value of ext3_journal_get_write_access() and
ext3_journal_dirty_metadata().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
fallout
Use %pV in __quota_error so a single printk can not be
interleaved with other logging messages.
Add __attribute__((format (printf, 3, 4))) so format
and arguments can be verified by compiler.
Make sure printk formats and arguments match.
Block # needed a pointer dereference.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The ioctl takes fstrim_range structure (defined in include/linux/fs.h) as an
argument specifying a range of filesystem to trim and the minimum size of an
continguous extent to trim. After the FITRIM is done, the number of bytes
passed from the filesystem down the block stack to the device for potential
discard is stored in fstrim_range.len. This number is a maximum discard amount
from the storage device's perspective, because FITRIM called repeatedly will
keep sending the same sectors for discard. fstrim_range.len will report the
same potential discard bytes each time, but only sectors which had been written
to between the discards would actually be discarded by the storage device.
Further, the kernel block layer reserves the right to adjust the discard ranges
to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc.
These reductions would not be reflected in fstrim_range.len.
Thus fstrim_range.len can give the user better insight on how much storage
space has potentially been released for wear-leveling, but it needs to be one
of only one criteria the userspace tools take into account when trying to
optimize calls to FITRIM.
Thanks to Greg Freemyer <greg.freemyer@gmail.com> for better commit message.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).
It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.
[JK: Fixed up error handling and trimming of a single group]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Since check_eofblocks_fl() only uses the m_lblk portion of the map
structure, we may as well pass that directly, rather than passing the
entire map, which IMHO obfuscates what parameters check_eofblocks_fl()
cares about. Not a big deal, but seems tidier and less confusing, to
me.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Commit 40389687 moved a call to ext4_forget() out of
ext4_free_branches and let ext4_free_blocks() handle calling
bforget(). But that change unfortunately did not replace the call to
ext4_forget() with brelse(), which was needed to drop the in-use count
of the indirect block's buffer head, which lead to a memory leak when
deleting files that used indirect blocks. Fix this.
Thanks to Hugh Dickins for pointing this out.
Cc: stable@kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This function was never implemented, except for a BUG_ON which was
tripping when ext4 is run without a journal. The problem is that
although the comment asserts that "truncate (which is the only way to
free block) discards all preallocations", ext4_free_blocks() is also
called in various error recovery paths when blocks have been
allocated, but for various reasons, we were not able to use those data
blocks (for example, because we ran out of memory while trying to
manipulate the extent tree, or some other similar situation).
In addition to the fact that this function isn't implemented except
for the incorrect BUG_ON, the single caller of this function,
ext4_free_blocks(), doesn't use it all if the journal is enabled.
So remove the (stub) function entirely for now. If we decide it's
better to add it back, it's only going to be useful with a relatively
large number of code changes anyway.
Google-Bug-Id: 3236408
Cc: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.
The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls
ext4_convert_unwritten_extents(), we may reallocate some blocks that have
been truncated, so the inode size becomes inconsistent with the allocated
blocks.
The following patch flushes the i_completed_io_list during truncate to reduce
the risk that some pending end_io requests are executed later and convert
already truncated blocks to initialized.
Note that although the fix helps reduce the problem a lot there may still
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.
Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:
a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.
b) We change the ext4 write path so that we change the extent tree to contain
the newly allocated blocks and update i_size both at the same time --- when
the write of the data blocks is completed.
c) We add some additional locking to synchronize vmtruncate() and
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.
All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Call ext4_std_error() in various places when we can't bail out
cleanly, so the file system can be marked as in error.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
When ext4_trim_fs() is called to trim a part of a single group, the
logic will wrongly set last block of the interval to 'len' instead
of 'first_block + len'. Thus a shorter interval is possibly trimmed.
Fix it.
CC: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|