Age | Commit message (Collapse) | Author |
|
Now that link_path_walk() is called without LOOKUP_PARENT
only from do_follow_link(), we can simplify the checks in
last component handling. First of all, checking if we'd
arrived to a directory is not needed - the caller will check
it anyway. And LOOKUP_FOLLOW is guaranteed to be there,
since we only get to that place with nd->depth > 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Now the only caller of link_path_walk() that does *not* pass
LOOKUP_PARENT is do_follow_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and note that we only need to do it for LAST_BIND symlinks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
new helper: walk_component(). Handles everything except symlinks;
returns negative on error, 0 on success and 1 on symlinks we decided
to follow. Drops out of RCU mode on such symlinks.
link_path_walk() and do_last() switched to using that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We don't want to allow creation of private hardlinks by different application
using the fd passed to them via SCM_RIGHTS. So limit the null relative name
usage in linkat syscall to CAP_DAC_READ_SEARCH
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
|
|
d_move puts the renamed dentry at the end of d_subdirs, screwing with our
cached dentry directory offsets. We were just clearing I_COMPLETE to avoid
any possibility of trouble. However, assigning the renamed dentry an
offset at the end of the directory (to match it's new d_subdirs position)
is sufficient to maintain correct behavior and hold onto I_COMPLETE.
This is especially important for workloads like rsync, which renames files
into place. Before, we would lose I_COMPLETE and do MDS lookups for each
file. With this patch we only talk to the MDS on create and rename.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Changes to make sure writeback fid is owned by root
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
change file attribute can result in making the file readonly.
So flush the dirty pages before that.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We need to call vmtruncate before 9p setattr operation, otherwise we
could write back some dirty pages between setattr with ATTR_SIZE and vmtruncate
causing some truncated pages to be written back to server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
With caching enabled, we need to make sure we don't
update inode->i_size via stat2inode because we could
have dirty data which is not yet written to the server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Add necessary #ifndef #endif blocks to avoid mulitple inclusion of same headers
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
This is similar to what ceph, ocfs2 and nfs does
http://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/18/1498534
May be we should get vfs fixed
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
One successfull directory operation we would have changed directory
inode attribute. So mark them invalid
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We need to revalidate . and .. entries also
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
rename, unlink and setattr can result in update of inode attribute.
So mark the cached copy invalid
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
With cached mode some of the file system operation result
in updating inode attributes (ctime). Add support for
marking inode attribute invalid in such cases so that
we fetch the updated inode attribute on dentry revalidation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We want to immediately drop the inode in non cached mode
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Only update inode i_size when we write towards end of file.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We want to enable readahead in cached mode
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Switch to the fscache code to v9fs_inode. We will later use
v9fs_inode in cache=loose mode to track the inode cache
validity timeout. Ie if we find an inode in cache older
that a specific jiffie range we will consider it stale
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
simple_getattr does set stat.st_blocks to a value
derived from nrpages. That is not correct with 9p
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We didn't add the inode to inode hash in 9p. We need to do that
to get sync to work, otherwise __mark_inode_dirty will not
add the inode to super block's dirty list.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
FIXME!! what about dotu ?
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We should not mark file system synchronous if mounted cache=* option
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Update the comment to indicate that we don't want to cache
negative dentries.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We can now support writeable mmaps.
Based on the original patch from Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
The fid attached to inode will be opened O_RDWR mode and is used
for dirty page writeback only.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We add read write helper function here which will
be used later by the mmap patch
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We need to call fscache_wait_on_page_write in launder_page
for fscache
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We need to ihold even in cached mode
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We need to call v9fs_cache_inode_set_cookie in create
path also
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
With the old code we were not setting the file->f_op
with cached file operations during creat.
(format correction by jvrao@linux.vnet.ibm.com)
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Current code sets access=user as default for all protocol versions.
This patch chagnes it to "client" only for dotl.
User can always specify particular access mode with -o access= option.
No change there.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
The mount option access=client is overloaded as it assumes acl too.
Adding posixacl option to enable POSIX ACLs makes it explicit and clear.
Also it is convenient in the future to add other types of acls like richacls.
Ideally, the access mode 'client' should be just like V9FS_ACCESS_USER
except it underscores the location of access check.
Traditional 9P protocol lets the server perform access checks but with
this mode, all the access checks will be performed on the client itself.
Server just follows the client's directive.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
If the kernel is not compiled with CONFIG_9P_FS_POSIX_ACL and the
mount option is specified to enable ACLs current code fails the mount.
This patch brings the behavior inline with other filesystems like ext3
by proceeding with the mount and log a warning to syslog.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
With create/mkdir/mknod in non cached mode we initialize the inode using
v9fs_get_inode. v9fs_get_inode doesn't initialize the cache inode value
to NULL. This is causing to trip on BUG_ON in v9fs_get_cached_acl.
Fix is to initialize acls to NULL and not to leave them in ACL_NOT_CACHED
state.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
|
|
In v9fs_get_acl() if __v9fs_get_acl() gets only one of the
dacl/pacl we are not releasing it.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
|
|
One leftover from the days of IBM's original code, is an SB counter
that counts in-flight asynchronous commands. And a piece of code that
waits for the counter to reach zero at unmount. I guess it might have
been needed then, cause of some reference missing or something.
I'm not removing it yet but am putting a warning message if ever this
counter triggers at unmount. If I'll never see it triggers or reported
I'll remove the counter for good.
(I had this print as a debug output for a long time and never had it
trigger)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Before when creating a new inode, we'd set the sb->s_dirt flag,
and sometime later the system would write out s_nextid as part
of the sb_info. Also on inode sync we would force the sb sync
as well.
Define the s_nextid as a new partition attribute and set it
every time we create a new object.
At mount we read it from it's new place.
We now never set sb->s_dirt anywhere in exofs. write_super
is actually never called. The call to exofs_write_super from
exofs_put_super is also removed because the VFS always calls
->sync_fs before calling ->put_super twice.
To stay backward-and-forward compatible we also write the old
s_nextid in the super_block object at unmount, and support zero
length attribute on mount.
This also fixes a BUG where in layouts when group_width was not
a divisor of EXOFS_SUPER_ID (0x10000) the s_nextid was not read
from the device it was written to. Because of the sliding window
layout trick, and because the read was always done from the 0
device but the write was done via the raid engine that might slide
the device view. Now we read and write through the raid engine.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
If /dev/osd* devices are shuffled because more devices
where added, and/or login order has changed. It is hard to
mount the FS you want.
Add an option to mount by osdname. osdname is any osd-device's
osdname as specified to the mkfs.exofs command when formatting
the osd-devices.
The new mount format is:
OPT="osdname=$UUID0,pid=$PID,_netdev"
mount -t exofs -o $OPT $DEV_OSD0 $MOUNTDIR
if "osdname=" is specified in options above $DEV_OSD0 is
ignored and can be empty.
Also while at it: Removed some old unused Opt_* enums.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
* Set all inode->i_mapping->backing_dev_info to point to
the per super-block sb->s_bdi.
* Calculating a read_ahead that is:
- preferable 2 stripes long
(Future patch will add a mount option to override this)
- Minimum 128K aligned up to stripe-size
- Caped to maximum-IO-sizes round down to stripe_size.
(Max sizes are governed by max bio-size that fits in a page
times number-of-devices)
CC: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
It is incorrect to test inode dirty bits without participating in the inode
writeback protocol. Inode writeback sets I_SYNC and clears I_DIRTY_?, then
writes out the particular bits, then clears I_SYNC when it is done. BTW. it
may not completely write all pages out, so I_DIRTY_PAGES would get set
again.
This is a standard pattern used throughout the kernel's writeback caches
(I_SYNC ~= I_WRITEBACK, if that makes it clearer).
And so it is not possible to determine an inode's dirty status just by
checking I_DIRTY bits. Especially not for the purpose of data integrity
syncs.
Missing the check for these bits means that fsync can complete while
writeback to the inode is underway. Inode writeback functions get this
right, so call into them rather than try to shortcut things by testing
dirty state improperly.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Don't attempt a read passed i_size, just zero the page and be
done with it.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
I stumbled on some of these prints in log files so, might
just submit the fixes.
* All i_ino prints in exofs should be hex
* All OSD_ERR prints should end with a "\n"
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
|
|
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|