Age | Commit message (Collapse) | Author |
|
Memory allocated using kmem_cache_zalloc should be freed using
kmem_cache_free, not kfree.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e,e1,e2;
@@
x = kmem_cache_zalloc(e1,e2)
... when != x = e
?-kfree(x)
+kmem_cache_free(e1,x)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
|
|
takes vfsmount and relative path, does lookup within that vfsmount
(possibly triggering automounts) and returns the result as root
of subtree suitable for return by ->mount() (i.e. a reference to
dentry and an active reference to its superblock grabbed, superblock
locked exclusive).
btrfs and nfs switched to it instead of open-coding the sucker.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Life is much saner if create_mnt_ns(mnt) drops mnt in case of error...
Switch it to such calling conventions, switch callers, fix double mntput() in
fs/nfs/super.c one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is just a cleanup patch to silence a static checker warning.
The problem is that we cap "nr_iovecs" so it can't be larger than
"UIO_MAXIOV" but we don't check for negative values. It turns out this is
prevented at other layers, but logically it doesn't make sense to have
negative nr_iovecs so making it unsigned is nicer.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Address the possible performance regression mentioned in "nfsd4: hash
lockowners to simplify RELEASE_LOCKOWNER" by providing a separate
(lockowner, inode) hash.
Really, I doubt this matters much, but I think it's likely we'll change
these data structures here and I'd rather that the need for (owner,
inode) lookups be well-documented.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
whether a new area to handle quota information will be allocated
if needed. Originally, it was passed to xfs_qm_dqget(), but has
been removed by the following commit (probably by mistake):
commit 8e9b6e7fa4544ea8a0e030c8987b918509c8ff47
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Feb 8 21:51:42 2009 +0100
xfs: remove the unused XFS_QMOPT_DQLOCK flag
As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
never allocates the new area even if it is needed.
This patch gives the doalloc arg to xfs_qm_dqget() in
xfs_qm_dqattach_one() to fix this problem.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
On a corrupted file system the ->len field could be wrong leading to
a buffer overflow.
Reported-and-acked-by: Clement LECIGNE <clement.lecigne@netasq.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This upstream patch had what I believe is an unintended consequence:
http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93
The patch changed function get_local_rgrp such that it ONLY
used TRY locks for RGRP searches. Prior to that patch, the code
used TRY locks during the first loop, and if that was unsuccessful,
it used normal blocking locks on subsequent searches. This patch
changes it back to the old way.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.
But there are two cases to lead to tree corruptions:
1) multi-thread snapshots can commit serveral snapshots in a transaction,
and this may change the src root when processing the following pending
snapshots, which lead to the former snapshots corruptions;
2) the free inode cache was changing the roots when it root the cache,
which lead to corruptions.
This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver
in the config and Makefile") as I have patch depending on that one.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: rename the option to nospace_cache
Btrfs: handle bio_add_page failure gracefully in scrub
Btrfs: fix deadlock caused by the race between relocation
Btrfs: only map pages if we know we need them when reading the space cache
Btrfs: fix orphan backref nodes
Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
Btrfs: fix unreleased path in btrfs_orphan_cleanup()
Btrfs: fix no reserved space for writing out inode cache
Btrfs: fix nocow when deleting the item
Btrfs: tweak the delayed inode reservations again
Btrfs: rework error handling in btrfs_mount()
Btrfs: close devices on all error paths in open_ctree()
Btrfs: avoid null dereference and leaks when bailing from open_ctree()
Btrfs: fix subvol_name leak on error in btrfs_mount()
Btrfs: fix memory leak in btrfs_parse_early_options()
Btrfs: fix our reservations for updating an inode when completing io
Btrfs: fix oops on NULL trans handle in btrfs_truncate
btrfs: fix double-free 'tree_root' in 'btrfs_mount()'
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix force shutdown handling in xfs_end_io
xfs: constify xfs_item_ops
xfs: Fix possible memory corruption in xfs_readlink
|
|
Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference
in ceph_d_prune on umount. It also means we can eventually strip out all
of the conditional checks on d_fsdata because it is now set unconditionally
(prior to setting up the d_ops).
Fix the ceph_d_prune debug print while we're here.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.
The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
dm based targets accept only one page per bio, thus making scrub always
fails. This patch just submits the current bio when an error is encountered
and starts a new one.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
We can not do flushable reservation for the relocation when we create snapshot,
because it may make the transaction commit task and the flush task wait for
each other and the deadlock happens.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
People have been running into a warning when loading space cache because the
page is already mapped when trying to read in a bitmap. The way we read in
entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
maps the entries if it needs to, and if it hits the end of the page it simply
unmaps the page. That way we can unconditionally unmap the io_ctl before
reading in the bitmap and we should stop hitting these warnings. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().
kernel BUG at fs/btrfs/relocation.c:239!
This patch fixes it by adding them into ->leaves list of backref cache.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
When we did stress test for the space relocation, the deadlock happened.
By debugging, We found it was caused by the carelessness that we forgot
to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
before we end the transaction handle, so the transaction commit task waited
the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
but that task waited the commit task to end the transaction commit, and
the deadlock happened. Fix it.
Signed-ff-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
I-node cache forgets to reserve the space when writing out it. And when
we do some stress test, such as synctest, it will trigger WARN_ON() in
use_block_rsv().
WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
...
Call Trace:
[<ffffffff8104df86>] warn_slowpath_common+0x80/0x98
[<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17
[<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
[<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
[<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs]
[<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs]
[<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs]
[<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs]
[<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs]
[<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs]
[<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
[<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs]
[<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs]
[<ffffffff810690df>] ? wake_up_bit+0x25/0x25
[<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
[<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs]
[<ffffffff81122806>] ? mntput+0x21/0x23
[<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
[<ffffffff8112d170>] vfs_fsync+0x17/0x19
[<ffffffff8112d316>] do_fsync+0x29/0x3e
[<ffffffff8112d348>] sys_fsync+0xb/0xf
[<ffffffff81468352>] system_call_fastpath+0x16/0x1b
Sometimes it causes BUG_ON() in the reservation code of the delayed inode
is triggered.
So we must reserve enough space for inode cache.
Note: If we can not reserve the enough space for inode cache, we will
give up writing out it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
integration
|
|
Josef sent along an incremental to the inode reservation
code to make sure we try and fall back to directly updating
the inode item if things go horribly wrong.
This reworks that patch slightly, adding a fallback function
that will always try to update the inode item directly without
going through the delayed_inode code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
pNFS-specific code belongs in the pnfs layer. It should not be
hijacking generic NFS read or write code paths.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This reverts commit aa6afca5bcaba8101f3ea09d5c3e4100b2b9f0e5.
It escalates of some of the google-chrome SELinux problems with ptrace
("Check failed: pid_ > 0. Did not find zygote process"), and Andrew
says that it is also causing mystery lockdep reports.
Reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Requested-by: James Morris <jmorris@namei.org>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
dereference on error paths, also we would leave all devices busy and
leak fs_info with all sub-structures on error when trying to mount an
already mounted fs to a different directory.
Fix this by doing all allocations before trying to open any of the
devices, adjust error path for mount-already-mounted-fs case.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree(). fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info(). Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.
Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
btrfs_parse_early_options() can fail due to error while scanning devices
(-o device= option), but still strdup() subvol_name string:
mount -o subvol=SUBV,device=BAD_DEVICE <dev> <mnt>
So free subvol_name string on error.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Don't leak subvol_name string in case multiple subvol= options are
given. "The lastest option is effective" behavior (consistent with
subvolid= and subvolrootid= options) is preserved.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
As a result, we don't need to test it each time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
|
|
This was spotted by automated code analysis. In case reading
an ACL xattr failed (only likely to happen if there is an I/O
error for example, and even then only with unstuffed xattrs,
so pretty difficult to trigger) a small amount of memory could
potentially be leaked.
This patch adds a kfree to the error path, and also removes a
test which is no longer required (gfs2_ea_get_copy always
returns either a negative error, or a length)
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Call ext3_mark_recovery_complete() in ext3_fill_super() only if
needs_recovery is non-zero.
Besides that, print out "recovery complete" message after calling
ext3_mark_recovery_complete().
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
People have been reporting ENOSPC crashes in finish_ordered_io. This is because
we try to steal from the delalloc block rsv to satisfy a reservation to update
the inode. The problem with this is we don't explicitly save space for updating
the inode when doing delalloc. This is kind of a problem and we've gotten away
with this because way back when we just stole from the delalloc reserve without
any questions, and this worked out fine because generally speaking the leaf had
been modified either by the mtime update when we did the original write or
because we just updated the leaf when we inserted the file extent item, only on
rare occasions had the leaf not actually been modified, and that was still ok
because we'd just use a block or two out of the over-reservation that is
delalloc.
Then came the delayed inode stuff. This is amazing, except it wants a full
reservation for updating the inode since it may do it at some point down the
road after we've written the blocks and we have to recow everything again. This
worked out because the delayed inode stuff just stole from the global reserve,
that is until recently when I changed that because it caused other problems.
So here we are, we're doing everything right and being screwed for it. So take
an extra reservation for the inode at delalloc reservation time and carry it
through the life of the delalloc reservation. If we need it we can steal it in
the delayed inode stuff. If we have already stolen it try and do a normal
metadata reservation. If that fails try to steal from the delalloc reservation.
If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
solve this and in the meantime we'll steal from the global reserve.
With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
any problems.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
If we fail to reserve space in the transaction during truncate, we can
error out with a NULL trans handle. The cleanup code needs an extra
check to make sure we aren't trying to use the bad handle.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Ensure ioend->io_error gets propagated back to e.g. AIO completions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
|
|
The log item ops aren't nessecarily the biggest exploit vector, but marking
them const is easy enough. Also remove the unused xfs_item_ops_t typedef
while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
|
|
Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.
Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
- Changed type of "pathlen" to be xfs_fsize_t, to match that of
ip->i_d.di_size
- Added checking for a negative pathlen to the too-long pathlen
test, and generalized the message that gets reported in that case
to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Now that they're used in the same way, it's a little simpler to put open
and lock owners in the same hash table, and I can't see a reason not to.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch adds read-ahead capability to GFS2's
directory hash table management. It greatly improves
performance for some directory operations. For example:
In one of my file systems that has 1000 directories, each
of which has 1000 files, time to execute a recursive
ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced
from 2m2.814s on a stock kernel to 0m45.938s.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Christoph has split up REQ_PRIO from REQ_META. That means that
we can drop REQ_PRIO from places where is it not needed. I'm
not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
makes any kind of sense, anyway.
In addition, I've added REQ_META to one place in the code where
it was missing. REQ_PRIO has been left for read/writes triggered
by glock acquisition and writeback only. We can adjust it again
if required, but these are the most important points from a
performance perspective.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
|
|
Hash lockowners on just the owner string rather than on (owner, inode).
This makes the owner-string lookup needed for RELEASE_LOCKOWNER simpler
(currently it's doing at a linear search through the entire hash
table!). That may come at the expense of making (owner, inode) lookups
more expensive if a client reuses the same lockowner across multiple
files. We might add a separate lookup for that.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The close parenthesis was hard to find with it spaced so far over.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[bfields@redhat.com: get all these lines under 80 chars while we're here]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
init_nfsd() was calling free_slabs() during cleanup code, but the call
to init_slabs() was hidden in nfsd4_state_init(). This could be
confusing to people unfamiliar with the code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Fault injection on the NFS server makes it easier to test the client's
state manager and recovery threads. Simulating errors on the server is
easier than finding the right conditions that cause them naturally.
This patch uses debugfs to add a simple framework for fault injection to
the server. This framework is a config option, and can be enabled
through CONFIG_NFSD_FAULT_INJECTION. Assuming you have debugfs mounted
to /sys/debug, a set of files will be created in /sys/debug/nfsd/.
Writing to any of these files will cause the corresponding action and
write a log entry to dmesg.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|