Age | Commit message (Collapse) | Author |
|
First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed. Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This reverts commit 97d79b403ef03f729883246208ef5d8a2ebc4d68.
This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
merge hfs_unlink() and hfs_rmdir(), while we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
(256 << sizeof(x)) - 1 is not the maximal possible value of x...
In reality, the maximal allowed value for UDF FileLinkCount is
65535.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
if directory has so many subdirectories that its link count is set
to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails,
we shouldn't decrement the parent's link count in cleanup path;
that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up
with parent suddenly getting zero i_nlink, with very unpleasant
effects.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
of/promtree: allow DT device matching by fixing 'name' brokenness (v5)
x86: OLPC: have prom_early_alloc BUG rather than return NULL
of/flattree: Drop an uninteresting message to pr_debug level
of: Add missing of_address.h to xilinx ehci driver
|
|
This introduces a new per-superblock mutex in UFS to replace
the big kernel lock. I have been careful to avoid nested
calls to lock_ufs and to get the lock order right with
respect to other mutexes, in particular lock_super.
I did not make any attempt to prove that the big kernel
lock is not needed in a particular place in the code,
which is very possible.
The mutex has a significant performance impact, so it is only
used on SMP or PREEMPT configurations.
As Nick Piggin noticed, any allocation inside of the lock
may end up deadlocking when we get to ufs_getfrag_block
in the reclaim task, so we now use GFP_NOFS.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Nick Piggin <npiggin@gmail.com>
|
|
This removes the BKL in hpfs in a rather awful
way, by making the code only work on uniprocessor
systems without kernel preemption, as suggested
by Andi Kleen.
The HPFS code probably has close to zero remaining
users on current kernels, all archeological uses of
the file system can probably be done with the significant
restrictions.
The hpfs_lock/hpfs_unlock functions are left in the
code, sincen Mikulas has indicated that he is still
interested in fixing it in a better way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: linux-fsdevel@vger.kernel.org
|
|
This message looks like an error (which it isn't) when booting with a
flattened device tree. Remove the message from normal kernel builds.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.
In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.
CC: stable@kernel.org
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:
+ memset(geo, 0, sizeof(*geo));
Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires. As a result, this can happen:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93
Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:
[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]
Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.
Note: This patch is an alternative to one originally proposed by
Eric Sandeen.
Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
|
|
Most of the logging infrastructure in XFS is unneccessary and
designed around the infrastructure supplied by Irix rather than
Linux. To rationalise the logging interfaces, start by introducing
simple printk wrappers similar to the dev_printk() infrastructure.
Later patches will convert code to use this new interface.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:
+ memset(geo, 0, sizeof(*geo));
Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires. As a result, this can happen:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93
Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:
[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]
Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.
Note: This patch is an alternative to one originally proposed by
Eric Sandeen.
Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
|
|
According to the report from Jiro SEKIBA titled "regression in
2.6.37?" (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.
This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.
This patch fixes the regression and brings that application behavior
back to normal.
Reported-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org> [2.6.37]
|
|
According to Russell King, adfs was written to not require the big
kernel lock, and all inode updates are done under adfs_dir_lock.
All other metadata in adfs is read-only and does not require locking.
The use of the BKL is the result of various pushdowns from the VFS
operations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
|
|
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Fix new kernel-doc warning in fs/block_dev.c:
Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix truncate after open
fuse: fix hang of single threaded fuseblk filesystem
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Check heartbeat mode for kernel stacks only
Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
ocfs2: Fix estimate of necessary credits for mkdir
|
|
The Patch below removes one to many "n's" in a word..
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ext4@vger.kernel.org
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Orphan cleanup is currently executed even if the file system has some
number of unknown ROCOMPAT features, which deletes inodes and frees
blocks, which could be very bad for some RO_COMPAT features.
This patch skips the orphan cleanup if it contains readonly compatible
features not known by this ext3 implementation, which would prevent
the fs from being mounted (or remounted) readwrite.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.
In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.
CC: stable@kernel.org
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Squashfs_get_sb() to squashfs_mount() conversion (commit 152a0836)
results in line over 80 characters.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
|
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
|
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
|
Pass the dictionary size used to compress datablocks. Using a
dictionary size less than the block size saves memory overhead, in many
cases without adversely affecting compression ratio.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
|
Extend decompressor framework to handle compression options stored in
the filesystem. These options can be used by the relevant decompressor
at initialisation time to over-ride defaults.
The presence of compression options in the filesystem is indicated by
the COMP_OPT filesystem flag. If present the data is read from the
filesystem and passed to the decompressor init function. The decompressor
init function signature has been extended to take this data.
Also update the init function signature in the glib, lzo and xz
decompressor wrappers.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
|
If no extent conversion is required, wake up any processes waiting for
the page's writeback to be complete and free the ext4_io_end structure
directly in ext4_end_bio() instead of dropping it on the linked list
(which requires taking a spinlock to queue and dequeue the io_end
structure), and waiting for the workqueue to do this work.
This removes an extra scheduling delay before process waiting for an
fsync() to complete gets woken up, and it also reduces the CPU
overhead for a random write workload.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Orphan cleanup is currently executed even if the file system has some
number of unknown ROCOMPAT features, which deletes inodes and frees
blocks, which could be very bad for some RO_COMPAT features,
especially the SNAPSHOT feature.
This patch skips the orphan cleanup if it contains readonly compatible
features not known by this ext4 implementation, which would prevent
the fs from being mounted (or remounted) readwrite.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
nblocks is passed into ext4_truncate_restart_trans() from
ext4_ext_truncate_extend_restart() with a value different from the default
blocks_for_truncate(), but is being ignored.
The two other calls to ext4_truncate_restart_trans() already pass the
default value, which is then being recalculated inside the function.
Fix the problem by using the passed argument.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
|
|
This assures that the root inode is not leaked, and that sb->s_root is
NULL, which will prevent generic_shutdown_super() from doing extra
work, including call sync_filesystem, which ultimately results in
ext4_sync_fs() getting called with an uninitialized struct super,
which is the cause of the crash noted in Kernel Bugzilla #26752.
https://bugzilla.kernel.org/show_bug.cgi?id=26752
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Fix the FIEMAP ioctl so that it returns all of the page ranges which
are still subject to delayed allocation. We were missing some cases
if the file was sparse.
Reported by Chris Mason <chris.mason@oracle.com>:
>We've had reports on btrfs that cp is giving us files full of zeros
>instead of actually copying them. It was tracked down to a bug with
>the btrfs fiemap implementation where it was returning holes for
>delalloc ranges.
>
>Newer versions of cp are trusting fiemap to tell it where the holes
>are, which does seem like a pretty neat trick.
>
>I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>passed with all the ones btrfs was getting wrong, and ext4 got the basic
>delalloc case right.
>$ mkfs.ext4 /dev/xxx
>$ mount /dev/xxx /mnt
>$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
>$ fiemap-test foo
>ext: 0 logical: [ 0.. 255] phys: 0.. 255
>flags: 0x007 tot: 256
>
>Horray! But once we throw a hole in, things go bad:
>$ mkfs.ext4 /dev/xxx
>$ mount /dev/xxx /mnt
>$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>$ fiemap-test foo
>< no output >
>
>We've got a delalloc extent after the hole and ext4 fiemap didn't find
>it. If I run sync to kick the delalloc out:
>$sync
>$ fiemap-test foo
>ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
>flags: 0x001 tot: 256
>
>fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>got there. It's full of pretty comments so I know it isn't mine, but
>you can grab it here:
>
>http://oss.oracle.com/~mason/fiemap-test.c
>
>xfsqa has a fiemap program too.
After Fix, test results are as follows:
ext: 0 logical: [ 256.. 511] phys: 0.. 255
flags: 0x007 tot: 256
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x001 tot: 256
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
$ sync
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=3
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=5
$ fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x000 tot: 256
ext: 1 logical: [ 768.. 1023] phys: 0.. 255
flags: 0x006 tot: 256
ext: 2 logical: [ 1280.. 1535] phys: 0.. 255
flags: 0x007 tot: 256
Tested-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If CONFIG_EXT4_DEBUG is enabled, then if a block allocation fails due
to disk being full, a verbose debugging message is printed, even if
the malloc-debug switch has not been enabled. Suppress the debugging
message so that nothing is printed unless malloc-debug has been turned
on.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
In ext4_bio_write_page(), if the memory allocation for the struct
ext4_io_page fails, it returns with the page's PageWriteback flag set.
This will end up causing the page not to skip writeback in
WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
umount) the writeback daemon will get stuck forever on the
wait_on_page_writeback() function in write_cache_pages_da().
Or, if journalling is enabled and the file gets deleted, it the
journal thread can get stuck in journal_finish_inode_data_buffers()
call to filemap_fdatawait().
Another place where things can get hung up is in
truncate_inode_pages(), called out of ext4_evict_inode().
Fix this by not setting PageWriteback until after we have successfully
allocated the struct ext4_io_page.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Move the initialization of all of the fields of the mpd structure to
write_cache_pages_da(). This simplifies the code considerably.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If we have accumulated a contiguous region of memory to be written
out, and the next page can added to this region, don't bother locking
(and then unlocking the page) before writing out the memory. In the
unlikely event that the next page was being written back by some other
CPU, we can also skip waiting that page to finish writeback.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Because the ext4 page writeback codepath had been prematurely calling
clear_page_dirty_for_io(), if it turned out that a particular page
couldn't be written out during a particular pass of
write_cache_pages_da(), the page would have to get redirtied by
calling redirty_pages_for_writeback(). Not only was this wasted work,
but redirty_page_for_writeback() would increment wbc->pages_skipped to
signal to writeback_sb_inodes() that buffers were locked, and that it
should skip this inode until later.
Since this signal was incorrect in ext4's case --- which was caused by
ext4's historically incorrect use of write_cache_pages() ---
ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
confusing writeback_sb_inodes().
Now that we've fixed ext4 to call clear_page_dirty_for_io() right
before initiating the page I/O, we can nuke the page_skipped
save/restore hackery, and breathe a sigh of relief.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Move when we call clear_page_dirty_for_io() to just before we actually
write the page. This simplifies the code somewhat, and avoids marking
pages as clean and then needing to remark them as dirty later.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Eliminate duplicate code, unneeded variables, etc., to make it easier
to understand the code. No behavioral changes were made in this patch.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Fold the __mpage_da_writepage() function into write_cache_pages_da().
This will give us opportunities to clean up and simplify the resulting
code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Now that we've fixed the file corruption bug in commit d50bdd5aa55,
it's time to enable mblk_io_submit by default.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If ext4_da_block_invalidatepages() is called because of a
failure from ext4_map_blocks() in mpage_da_map_and_submit(),
it's supposed to clean up -- including unlock -- all the
pages in the mpd structure. But these values may not match
up, even on a system in which block size == page size:
mpd->b_blocknr != mpd->first_page
mpd->b_size != (mpd->next_page - mpd->first_page)
ext4_da_block_invalidatepages() has been using b_blocknr and
b_size; this patch changes it to use first_page and
next_page.
Tested: I injected a small number (5%) of failures in
ext4_map_blocks() in the case that the flags contain
EXT4_GET_BLOCKS_DELALLOC_RESERVE, and ran fsstress on this
kernel. Without this patch, I got hung tasks every time.
With this patch, I see no hangs in many runs of fsstress.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
In mpage_da_map_and_submit(), if we have a delayed block
allocation failure from ext4_map_blocks(), we need to mark
the IO as complete, by setting
mpd->io_done = 1;
Otherwise, we could end up submitting the pages in an outer
loop; since they are unlocked on mapping failure in
ext4_da_block_invalidatepages(), this will cause a bug check
in mpage_da_submit_io().
I tested this by injected failures into ext4_map_blocks().
Without this patch, a simple fsstress run will bug check;
with the patch, it works fine.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
A race can occur when io_submit() races with io_destroy():
CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOs
We solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx. Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.
Cc: Nick Piggin <npiggin@kernel.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|