Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: try to send partial cap release on cap message on missing inode
ceph: release cap on import if we don't have the inode
ceph: fix misleading/incorrect debug message
ceph: fix atomic64_t initialization on ia64
ceph: fix lease revocation when seq doesn't match
ceph: fix f_namelen reported by statfs
ceph: fix memory leak in statfs
ceph: fix d_subdirs ordering problem
|
|
If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately. This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.
If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Nothing is released here: the caps message is simply ignored in this case.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes
build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
As it stands this check compares the number of pages to the page size.
This makes no sense and makes the fcntl fail in almost any sane case.
Fix it by checking if nr_pages is not zero (it can become zero only if
arg is too big and round_pipe_size() overflows).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
pipe_set_size() needs to copy pipe bufs from the old circular buffer
to the new.
The current code gets this wrong in multiple ways, resulting in oops.
Test program is available here:
http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
We do the same BUG_ON() just a line later when calling into
__bd_abort_claiming().
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
I don't like the subtle multi-context code in bd_claim (ie. detects where it
has been called based on bd_claiming). It seems clearer to just require a new
function to finish a 2-part claim.
Also improve commentary in bd_start_claiming as to how it should
be used.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
bd_start_claiming has an unbalanced module_put introduced in 6b4517a79.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux:
nfsd4: shut down callback queue outside state lock
nfsd: nfsd_setattr needs to call commit_metadata
|
|
Now that the background flush code has been fixed, we shouldn't need to
silently multiply the wbc->nr_to_write to get good writeback. Remove
that code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
This reportedly causes a lockdep warning on nfsd shutdown. That looks
like a false positive to me, but there's no reason why this needs the
state lock anyway.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
* git://git.infradead.org/~dwmw2/mtd-2.6.35:
jffs2: update ctime when changing the file's permission by setfacl
jffs2: Fix NFS race by using insert_inode_locked()
jffs2: Fix in-core inode leaks on error paths
mtd: Fix NAND submenu
mtd/r852: update card detect early.
mtd/r852: Fixes in case of DMA timeout
mtd/r852: register IRQ as last step
drivers/mtd: Use memdup_user
docbook: make mtd nand module init static
|
|
jffs2 didn't update the ctime of the file when its permission was changed.
Steps to reproduce:
# touch aaa
# stat -c %Z aaa
1275289822
# setfacl -m 'u::x,g::x,o::x' aaa
# stat -c %Z aaa
1275289822 <- unchanged
But, according to the spec of the ctime, jffs2 must update it.
Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags
ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files
|
|
A few functions were still modifying i_flags in a racy manner.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: improve xfs_isilocked
xfs: skip writeback from reclaim context
xfs: remove done roadmap item from xfs-delayed-logging-design.txt
xfs: fix race in inode cluster freeing failing to stale inodes
xfs: fix access to upper inodes without inode64
xfs: fix might_sleep() warning when initialising per-ag tree
fs/xfs/quota: Add missing mutex_unlock
xfs: remove duplicated #include
xfs: convert more trace events to DEFINE_EVENT
xfs: xfs_trace.c: remove duplicated #include
xfs: Check new inode size is OK before preallocating
xfs: clean up xlog_align
xfs: cleanup log reservation calculactions
xfs: be more explicit if RT mount fails due to config
xfs: replace E2BIG with EFBIG where appropriate
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: remove obsolete declarations of cache constructor and destructor
nilfs2: fix style issue in nilfs_destroy_cachep
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Minix: Clean up left over label
fix truncate inode time modification breakage
fix setattr error handling in sysfs, configfs
fcntl: return -EFAULT if copy_to_user fails
wrong type for 'magic' argument in simple_fill_super()
fix the deadlock in qib_fs
mqueue doesn't need make_bad_inode()
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
block: make blk_init_free_list and elevator_init idempotent
block: avoid unconditionally freeing previously allocated request_queue
pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
pipe: change the privilege required for growing a pipe beyond system max
pipe: adjust minimum pipe size to 1 page
block: disable preemption before using sched_clock()
cciss: call BUG() earlier
Preparing 8.3.8rc2
drbd: Reduce verbosity
drbd: use drbd specific ratelimit instead of global printk_ratelimit
drbd: fix hang on local read errors while disconnected
drbd: Removed the now empty w_io_error() function
drbd: removed duplicated #includes
drbd: improve usage of MSG_MORE
drbd: need to set socket bufsize early to take effect
drbd: improve network latency, TCP_QUICKACK
drbd: Revert "drbd: Create new current UUID as late as possible"
brd: support discard
Revert "writeback: fix WB_SYNC_NONE writeback from umount"
Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
...
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
fix setattr error handling in sysfs, configfs
kobject: free memory if netlink_kernel_create() fails
lib/kobject_uevent.c: fix CONIG_NET=n warning
|
|
The data chunk is mmaped with 'len' which remains unchanged, so use that
when unmapping in the error path rather than trying to recalculate (and
incorrectly so) the value used originally.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David McCullough <davidm@snapgear.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The stack and data have different alignment requirements, so don't force
them to wear the same shoe. Increase the data alignment to match that
which the elf2flt linker script has always been using: 0x20 bytes. Not
only does this bring the kernel loader in line with the toolchain, but it
also fixes a swath of gcc tests which try to force larger alignment values
but randomly fail when the FLAT loader fails to deliver.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: David McCullough <davidm@snapgear.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jie Zhang <jie@codesourcery.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A call to access_ok is missing a compat_ptr conversion. Introduced with
b83733639a494d5f42fa00a2506563fbd2d3015d "compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev"
fs/compat.c: In function 'compat_rw_copy_check_uvector':
fs/compat.c:629: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove a left over fail label.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
mtime and ctime should be changed only if the file size has actually
changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
sequence has caused regressions where they always update timestamps.
There is some strange cases in POSIX where truncate(2) must not update
times unless the size has acutally changed, see 6e656be89.
This area is all still rather buggy in different ways in a lot of
filesystems and needs a cleanup and audit (ideally the vfs will provide
a simple attribute or call to direct all filesystems exactly which
attributes to change). But coming up with the best solution will take a
while and is not appropriate for rc anyway.
So fix recent regression for now.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sysfs and configfs setattr functions have error cases after the generic inode's
attributes have been changed. Fix consistency by changing the generic inode
attributes only when it is guaranteed to succeed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
copy_to_user() returns the number of bytes remaining, but we want to
return -EFAULT.
ret = fcntl(fd, F_SETOWN_EX, NULL);
With the original code ret would be 8 here.
V2: Takuya Yoshikawa pointed out a similar issue in f_getown_ex()
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It's used to superblock ->s_magic, which is unsigned long.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reviewed-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
CC: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sysfs and configfs setattr functions have error cases after the generic inode's
attributes have been changed. Fix consistency by changing the generic inode
attributes only when it is guaranteed to succeed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
If the client revokes a lease with a higher seq than what we have, keep
the mds's seq, so that it honors our release. Otherwise, we can hang
indefinitely.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix page refcount leak
|
|
This changes the interface to be based on bytes instead. The API
matches that of F_SETPIPE_SZ in that it rounds up the passed in
size so that the resulting page array is a power-of-2 in size.
The proc file is renamed to /proc/sys/fs/pipe-max-size to
reflect this change.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Change it to CAP_SYS_RESOURCE, as that more accurately models what
we want to control.
Suggested-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
We don't need to pages to guarantee the POSIX requirement
that upto a page size write must be atomic to an empty
pipe.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
New inodes need to be locked as we're creating them, so they don't get used
by other things (like NFSd) before they're ready.
Pointed out by Al Viro.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Pointed out by Al Viro.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Use rwsem_is_locked to make the assertations for shared locks work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Allowing writeback from reclaim context causes massive problems with stack
overflows as we can call into the writeback code which tends to be a heavy
stack user both in the generic code and XFS from random contexts that
perform memory allocations.
Follow the example of btrfs (and in slightly different form ext4) and refuse
to write out data from reclaim context. This issue should really be handled
by the VM so that we can tune better for this case, but until we get it
sorted out there we have to hack around this in each filesystem with a
complex writeback path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
When an inode cluster is freed, it needs to mark all inodes in memory as
XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes
have a different life cycle to the buffer, and once the buffer is torn down
during transaction completion, we must ensure none of the inodes get written
back (which is what XFS_ISTALE does).
Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being
marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these
inodes either during inode reclaim or tail pushing on the AIL. The buffer is
read back, but no longer contains inodes and so triggers assert failures and
shutdowns. This was reproducable with at run.dbench10 invocation from xfstests.
There are two main causes of xfs_ifree_cluster() failing. The first is simple -
it checks in-memory inodes it finds in the per-ag icache to see if they are
clean without holding the flush lock. if they are clean it skips them
completely. However, If an inode is flushed delwri, it will
appear clean, but is not guaranteed to be written back until the flush lock has
been dropped. Hence we may have raced on the clean check and the inode may
actually be dirty. Hence always mark inodes found in memory stale before we
check properly if they are clean.
The second is more complex, and makes the first problem easier to hit.
Basically the in-memory inode scan is done with full knowledge it can be racing
with inode flushing and AIl tail pushing, which means that inodes that it can't
get the flush lock on might not be attached to the buffer after then in-memory
inode scan due to IO completion occurring. This is actually documented in the
code as "needs better interlocking". i.e. this is a zero-day bug.
Effectively, the in-memory scan must be done while the inode buffer is locked
and Io cannot be issued on it while we do the in-memory inode scan. This
ensures that inodes we couldn't get the flush lock on are guaranteed to be
attached to the cluster buffer, so we can then catch all in-memory inodes and
mark them stale.
Now that the inode cluster buffer is locked before the in-memory scan is done,
there is no need for the two-phase update of the in-memory inodes, so simplify
the code into two loops and remove the allocation of the temporary buffer used
to hold locked inodes across the phases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the
donor file is an append-only file, we should not allow the operation
to proceed, lest we end up overwriting the contents of an append-only
file.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
|
|
We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX.
That disagrees with ceph_lookup behavior (which checks against NAME_MAX),
and also makes the pjd posix test suite spit out ugly errors because with
can't clean up its temporary files.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Freeing the statfs request structure when required.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
We misused list_move_tail() to order the dentry in d_subdirs.
This will screw up the d_subdirs order.
This bug can be reliably reproduced by:
1. mount ceph fs.
2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git
3. Run autogen.sh in ceph directory.
(Note: Errors only occur at the first time you run autogen.sh.)
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The conversion of write_inode_now calls to commit_metadata in commit
f501912a35c02eadc55ca9396ece55fe36f785d0 missed out the call in nfsd_setattr.
But without this conversion we can't guarantee that a SETATTR request
has actually been commited to disk with XFS, which causes a regression
from 2.6.32 (only for NFSv2, but anyway).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
fscache_write_op() makes unnecessary checks of the page variable to see if it
is NULL. It can't be NULL at those points as the kernel would already have
crashed a little higher up where we examined page->index.
Furthermore, unless radix_tree_gang_lookup_tag() can return 1 but no page, a
NULL pointer crash should not be encountered there as we can only get there if
r_t_g_l_t() returned 1.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 315e995c63a15cb4d4efdbfd70fe2db191917f7a is causing OOM kills
when stress-testing a CIFS filesystem. The VFS readpages operation takes
a page reference. The older code just handed this reference off to the
page cache, but the new code takes an extra one. The simplest fix is to
put the new reference after add_to_page_cache_lru.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|