Age | Commit message (Collapse) | Author |
|
xfs_{compat_,}attrmulti_by_handle could return an errno with incorrect
sign in some cases. While at it, make sure ENOMEM is returned instead of
E2BIG if kmalloc fails.
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
group and project quota hints are currently stored on the user
dquot. If we are attaching quotas to the inode, then the group and
project dquots are stored as hints on the user dquot to save having
to look them up again later.
The thing is, the hints are not used for that inode for the rest of
the life of the inode - the dquots are attached directly to the
inode itself - so the only time the hints are used is when an inode
first has dquots attached.
When the hints on the user dquot don't match the dquots being
attache dto the inode, they are then removed and replaced with the
new hints. If a user is concurrently modifying files in different
group and/or project contexts, then this leads to thrashing of the
hints attached to user dquot.
If user quotas are not enabled, then hints are never even used.
So, if the hints are used to avoid the cost of the lookup, is the
cost of the lookup significant enough to justify the hint
infrstructure? Maybe it was once, when there was a global quota
manager shared between all XFS filesystems and was hash table based.
However, lookups are now much simpler, requiring only a single lock and
radix tree lookup local to the filesystem and no hash or LRU
manipulations to be made. Hence the cost of lookup is much lower
than when hints were implemented. Turns out that benchmarks show
that, too, with thir being no differnce in performance when doing
file creation workloads as a single user with user, group and
project quotas enabled - the hints do not make the code go any
faster. In fact, removing the hints shows a 2-3% reduction in the
time it takes to create 50 million inodes....
So, let's just get rid of the hints and the complexity around them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Coverity noticed that if we sent junk into
xfs_qm_scall_trunc_qfiles(), we could get back an
uninitialized error value. So sanitize the flags we
will accept, and initialize error anyway for good measure.
(This bug may have been introduced via c61a9e39).
Should resolve Coverity CID 1163872.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The Q_XQUOTARM quotactl was not working properly, because
we weren't passing around proper flags. The xfs_fs_set_xstate()
ioctl handler used the same flags for Q_XQUOTAON/OFF as
well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for
XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc,
i.e. quota type + state, while Q_XQUOTARM looks only for
the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc.
Unfortunately these flag spaces overlap a bit, so we
got semi-random results for Q_XQUOTARM; i.e. the value
for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc. yeargh.
Add a new quotactl op vector specifically for the QUOTARM
operation, since it operates with a different flag space.
This has been broken more or less forever, AFAICT.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Dan's "smatch" checker found out that there was a bug in the error path of the
'ubifs_remount_rw()' function. Instead of jumping to the "out" label which
cleans-things up, we just returned.
This patch fixes the problem.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
|
|
We have had this code in the kernel for over a year now and have
shaken all the known issues out of the code over the past few
releases. It's now time to remove the experimental warnings during
mount and fully support the new filesystem format in production
systems.
Remove the experimental warning, and add a version number to the
initial "mounting filesystem" message to tell use what type of
filesystem is being mounted. Also, remove the temporary inode
cluster size output at mount time now we know that this code works
fine.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Since now the shrink list is private and nobody can free the dentry while
it is on the shrink list, we can remove RCU protection from this.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Start with shrink_dcache_parent(), then scan what remains.
First of all, BUG() is very much an overkill here; we are holding
->s_umount, and hitting BUG() means that a lot of interesting stuff
will be hanging after that point (sync(2), for example). Moreover,
in cases when there had been more than one leak, we'll be better
off reporting all of them. And more than just the last component
of pathname - %pd is there for just such uses...
That was the last user of dentry_lru_del(), so kill it off...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If we find something already on a shrink list, just increment
data->found and do nothing else. Loops in shrink_dcache_parent() and
check_submounts_and_drop() will do the right thing - everything we
did put into our list will be evicted and if there had been nothing,
but data->found got non-zero, well, we have somebody else shrinking
those guys; just try again.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull aio fixes from Ben LaHaise:
"The first change from Anatol fixes a regression where io_destroy() no
longer waits for outstanding aios to complete. The second corrects a
memory leak in an error path for vectored aio operations.
Both of these bug fixes should be queued up for stable as well"
* git://git.kvack.org/~bcrl/aio-fixes:
aio: fix potential leak in aio_run_iocb().
aio: block io_destroy() until all context requests are completed
|
|
If the victim in on the shrink list, don't remove it from there.
If shrink_dentry_list() manages to remove it from the list before
we are done - fine, we'll just free it as usual. If not - mark
it with new flag (DCACHE_MAY_FREE) and leave it there.
Eventually, shrink_dentry_list() will get to it, remove the sucker
from shrink list and call dentry_kill(dentry, 0). Which is where
we'll deal with freeing.
Since now dentry_kill(dentry, 0) may happen after or during
dentry_kill(dentry, 1), we need to recognize that (by seeing
DCACHE_DENTRY_KILLED already set), unlock everything
and either free the sucker (in case DCACHE_MAY_FREE has been
set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
but it doesn't hold when failure happens right after aio_setup_vectored_rw().
Fix that in a such way to avoid hairy goto.
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The part of old d_free() that dealt with actual freeing of dentry.
Taken out of dentry_kill() into a separate function.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
There is no need to have most of the code in kill_ioctx() indented. Flatten
it.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
|
As reported by Anatol Pomozov, io_destroy() fails to report an error when
it loses the race to destroy a given ioctx. Since there is a difference in
behaviour between the thread that wins the race (which blocks on outstanding
io requests) versus lthe thread that loses (which returns immediately), wire
up a return code from kill_ioctx() to the io_destroy() syscall.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Anatol Pomozov <anatol.pomozov@gmail.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
To avoid releasing caps that are being used, encode_inode_release()
should send implemented caps to MDS.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
When creating a file, ceph_set_dentry_offset() puts the new dentry
at the end of directory's d_subdirs, then set the dentry's offset
based on directory's max offset. The offset does not reflect the
real postion of the dentry in directory. Later readdir reply from
MDS may change the dentry's position/offset. This inconsistency
can cause missing/duplicate entries in readdir result if readdir
is partly satisfied by dcache_readdir().
The fix is clear directory's completeness after creating/renaming
file. It prevents later readdir from using dcache_readdir().
Fixes: http://tracker.ceph.com/issues/8025
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Fuse doesn't support i_version (yet).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
The patch addresses two use-cases when the flag may be safely cleared:
1. fuse_do_setattr() is called with ATTR_CTIME flag set in attr->ia_valid.
In this case attr->ia_ctime bears actual value. In-kernel fuse must send it
to the userspace server and then assign the value to inode->i_ctime.
2. fuse_do_setattr() is called with ATTR_SIZE flag set in attr->ia_valid,
whereas ATTR_CTIME is not set (truncate(2)).
In this case in-kernel fuse must sent "now" to the userspace server and then
assign the value to inode->i_ctime.
In both cases we could clear I_DIRTY_SYNC, but that needs more thought.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Let the kernel maintain i_ctime locally: update i_ctime explicitly on
truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename,
unlink.
The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should
be flushed to the server eventually. The patch sets the flag and updates
i_ctime in course of operations listed above.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
This implements updating ctime as well as mtime on file_update_time().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
The patch extends fuse_setattr_in, and extends the flush procedure
(fuse_flush_times()) called on ->write_inode() to send the ctime as well as
mtime.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Allow userspace fs to specify time granularity.
This is needed because with writeback_cache mode the kernel is responsible
for generating mtime and ctime, but if the underlying filesystem doesn't
support nanosecond granularity then the cache will contain a different
value from the one stored on the filesystem resulting in a change of times
after a cache flush.
Make the default granularity 1s.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
...and flush mtime from this. This allows us to use the kernel
infrastructure for writing out dirty metadata (mtime at this point, but
ctime in the next patches and also maybe atime).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Don't need to start I/O twice (once without i_mutex and one within).
Also make sure that even if the userspace filesystem doesn't support FSYNC
we do all the steps other than sending the message.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
in preparation for getting rid of FUSE_I_MTIME_DIRTY.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
In case of fc->atomic_o_trunc is set, fuse does nothing in
fuse_do_setattr() while handling open(O_TRUNC). Hence, i_mtime must be
updated explicitly in fuse_finish_open(). The patch also adds extra locking
encompassing open(O_TRUNC) operation to avoid races between the truncation
and updating i_mtime.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Handling truncate(2), VFS doesn't set ATTR_MTIME bit in iattr structure;
only ATTR_SIZE bit is set. In-kernel fuse must handle the case by setting
mtime fields of struct fuse_setattr_in to "now" and set FATTR_MTIME bit
even though ATTR_MTIME was not set.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
When inode is in I_NEW state, inode->i_mode is not initialized yet. Do not
use it before fuse_init_inode() is called.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Bad case of shadowing.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Don't allow new fallocate modes until we figure out what (if anything) that
takes.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
fuse_ctl_cleanup is only called by __exit fuse_exit
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Sparse warning: fs/gfs2/lops.c:78:29:
"warning: Using plain integer as NULL pointer"
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: limit the path size in send to PATH_MAX
Btrfs: correctly set profile flags on seqlock retry
Btrfs: use correct key when repeating search for extent item
Btrfs: fix inode caching vs tree log
Btrfs: fix possible memory leaks in open_ctree()
Btrfs: avoid triggering bug_on() when we fail to start inode caching task
Btrfs: move btrfs_{set,clear}_and_info() to ctree.h
btrfs: replace error code from btrfs_drop_extents
btrfs: Change the hole range to a more accurate value.
btrfs: fix use-after-free in mount_subvol()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some kernfs fixes for 3.15-rc3 that resolve some reported
problems. Nothing huge, but all needed"
* tag 'driver-core-3.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
s390/ccwgroup: Fix memory corruption
kernfs: add back missing error check in kernfs_fop_mmap()
kernfs: fix a subdir count leak
|
|
fs_path_ensure_buf is used to make sure our path buffers for
send are big enough for the path names as we construct them.
The buffer size is limited to 32K by the length field in
the struct.
But bugs in the path construction can end up trying to build
a huge buffer, and we'll do invalid memmmoves when the
buffer length field wraps.
This patch is step one, preventing the overflows.
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Pull file locking fixes from Jeff Layton:
"File locking related bugfixes for v3.15 (pile #2)
- fix for a long-standing bug in __break_lease that can cause soft
lockups
- renaming of file-private locks to "open file description" locks,
and the command macros to more visually distinct names
The fix for __break_lease is also in the pile of patches for which
Bruce sent a pull request, but I assume that your merge procedure will
handle that correctly.
For the other patches, I don't like the fact that we need to rename
this stuff at this late stage, but it should be settled now
(hopefully)"
* tag 'locks-v3.15-2' of git://git.samba.org/jlayton/linux:
locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead
locks: rename file-private locks to "open file description locks"
locks: allow __break_lease to sleep even when break_time is 0
|
|
Pull nfsd bugfixes from Bruce Fields:
"Three small nfsd bugfixes (including one locks.c fix for a bug
triggered only from nfsd).
Jeff's patches are for long-existing problems that became easier to
trigger since the addition of vfs delegation support"
* 'for-3.15' of git://linux-nfs.org/~bfields/linux:
Revert "nfsd4: fix nfs4err_resource in 4.1 case"
nfsd: set timeparms.to_maxval in setup_callback_client
locks: allow __break_lease to sleep even when break_time is 0
|
|
While updating how mmap enabled kernfs files are handled by lockdep,
9b2db6e18945 ("sysfs: bail early from kernfs_file_mmap() to avoid
spurious lockdep warning") inadvertently dropped error return check
from kernfs_file_mmap(). The intention was just dropping "if
(ops->mmap)" check as the control won't reach the point if the mmap
callback isn't implemented, but I mistakenly removed the error return
check together with it.
This led to Xorg crash on i810 which was reported and bisected to the
commit and then to the specific change by Tobias.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-bisected-by: Tobias Powalowski <tobias.powalowski@googlemail.com>
Tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com>
References: http://lkml.kernel.org/g/533D01BD.1010200@googlemail.com
Cc: stable <stable@vger.kernel.org> # 3.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently kernfs_link_sibling() increates parent->dir.subdirs before
adding the node into parent's chidren rb tree.
Because it is possible that kernfs_link_sibling() couldn't find
a suitable slot and bail out, this leads to a mismatch between
elevated subdir count with actual children node numbers.
This patches fix this problem, by moving the subdir accouting
after the actual addtion happening.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_notify() is used to indicate either new data is available or
the content of a file has changed. It currently only triggers poll
which may not be the most convenient to monitor especially when there
are a lot to monitor. Let's hook it up to fsnotify too so that the
events can be monitored via inotify too.
fsnotify_modify() requires file * but kernfs_notify() doesn't have any
specific file associated; however, we can walk all super_blocks
associated with a kernfs_root and as kernfs always associate one ino
with inode and one dentry with an inode, it's trivial to look up the
dentry associated with a given kernfs_node. As any active monitor
would pin dentry, just looking up existing dentry is enough. This
patch looks up the dentry associated with the specified kernfs_node
and generates events equivalent to fsnotify_modify().
Note that as fsnotify doesn't provide fsnotify_modify() equivalent
which can be called with dentry, kernfs_notify() directly calls
fsnotify_parent() and fsnotify(). It might be better to add a wrapper
in fsnotify.h instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, there's no way to find out which super_blocks are
associated with a given kernfs_root. Let's implement it - the planned
inotify extension to kernfs_notify() needs it.
Make kernfs_super_info point back to the super_block and chain it at
kernfs_root->supers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|