From 51eb01e73599efb88c6c20b1c226d20309a75450 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sun, 25 Jun 2006 05:48:50 -0700 Subject: [PATCH] fuse: no backgrounding on interrupt Don't put requests into the background when a fatal interrupt occurs while the request is in userspace. This removes a major wart from the implementation. Backgrounding of requests was introduced to allow breaking of deadlocks. However now the same can be achieved by aborting the filesystem through the 'abort' sysfs attribute. This is a change in the interface, but should not cause problems, since these kinds of deadlocks never happen during normal operation. Signed-off-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/fuse.txt | 40 ++++++-------------------------------- 1 file changed, 6 insertions(+), 34 deletions(-) (limited to 'Documentation/filesystems/fuse.txt') diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 33f74310d16..e7747774ceb 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt @@ -304,25 +304,7 @@ Scenario 1 - Simple deadlock | | for "file"] | | *DEADLOCK* -The solution for this is to allow requests to be interrupted while -they are in userspace: - - | [interrupted by signal] | - | fuse_unlink() - | | [queue req on fc->pending] - | | [wake up fc->waitq] - | | [sleep on req->waitq] - -If the filesystem daemon was single threaded, this will stop here, -since there's no other thread to dequeue and execute the request. -In this case the solution is to kill the FUSE daemon as well. If -there are multiple serving threads, you just have to kill them as -long as any remain. - -Moral: a filesystem which deadlocks, can soon find itself dead. +The solution for this is to allow the filesystem to be aborted. Scenario 2 - Tricky deadlock ---------------------------- @@ -355,24 +337,14 @@ but is caused by a pagefault. | | [lock page] | | * DEADLOCK * -Solution is again to let the the request be interrupted (not -elaborated further). +Solution is basically the same as above. An additional problem is that while the write buffer is being copied to the request, the request must not be interrupted. This is because the destination address of the copy may not be valid after the request is interrupted. -This is solved with doing the copy atomically, and allowing -interruption while the page(s) belonging to the write buffer are -faulted with get_user_pages(). The 'req->locked' flag indicates -when the copy is taking place, and interruption is delayed until -this flag is unset. - -Scenario 3 - Tricky deadlock with asynchronous read ---------------------------------------------------- - -The same situation as above, except thread-1 will wait on page lock -and hence it will be uninterruptible as well. The solution is to -abort the connection with forced umount (if mount is attached) or -through the abort attribute in sysfs. +This is solved with doing the copy atomically, and allowing abort +while the page(s) belonging to the write buffer are faulted with +get_user_pages(). The 'req->locked' flag indicates when the copy is +taking place, and abort is delayed until this flag is unset. -- cgit v1.2.3-70-g09d2 From bafa96541b250a7051e3fbc5de6e8369daf8ffec Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sun, 25 Jun 2006 05:48:51 -0700 Subject: [PATCH] fuse: add control filesystem Add a control filesystem to fuse, replacing the attributes currently exported through sysfs. An empty directory '/sys/fs/fuse/connections' is still created in sysfs, and mounting the control filesystem here provides backward compatibility. Advantages of the control filesystem over the previous solution: - allows the object directory and the attributes to be owned by the filesystem owner, hence letting unpriviled users abort the filesystem connection - does not suffer from module unload race [akpm@osdl.org: fix this fs for recent dhowells depredations] [akpm@osdl.org: fix 64-bit printk warnings] Signed-off-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/fuse.txt | 30 +++-- fs/fuse/Makefile | 2 +- fs/fuse/control.c | 218 +++++++++++++++++++++++++++++++++++++ fs/fuse/dev.c | 2 +- fs/fuse/fuse_i.h | 53 +++++++-- fs/fuse/inode.c | 141 ++++++++---------------- 6 files changed, 331 insertions(+), 115 deletions(-) create mode 100644 fs/fuse/control.c (limited to 'Documentation/filesystems/fuse.txt') diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index e7747774ceb..324df27704c 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt @@ -18,6 +18,14 @@ Non-privileged mount (or user mount): user. NOTE: this is not the same as mounts allowed with the "user" option in /etc/fstab, which is not discussed here. +Filesystem connection: + + A connection between the filesystem daemon and the kernel. The + connection exists until either the daemon dies, or the filesystem is + umounted. Note that detaching (or lazy umounting) the filesystem + does _not_ break the connection, in this case it will exist until + the last reference to the filesystem is released. + Mount owner: The user who does the mounting. @@ -86,16 +94,20 @@ Mount options The default is infinite. Note that the size of read requests is limited anyway to 32 pages (which is 128kbyte on i386). -Sysfs -~~~~~ +Control filesystem +~~~~~~~~~~~~~~~~~~ + +There's a control filesystem for FUSE, which can be mounted by: -FUSE sets up the following hierarchy in sysfs: + mount -t fusectl none /sys/fs/fuse/connections - /sys/fs/fuse/connections/N/ +Mounting it under the '/sys/fs/fuse/connections' directory makes it +backwards compatible with earlier versions. -where N is an increasing number allocated to each new connection. +Under the fuse control filesystem each connection has a directory +named by a unique number. -For each connection the following attributes are defined: +For each connection the following files exist within this directory: 'waiting' @@ -110,7 +122,7 @@ For each connection the following attributes are defined: connection. This means that all waiting requests will be aborted an error returned for all aborted and new requests. -Only a privileged user may read or write these attributes. +Only the owner of the mount may read or write these files. Aborting a filesystem connection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -139,8 +151,8 @@ the filesystem. There are several ways to do this: - Use forced umount (umount -f). Works in all cases but only if filesystem is still attached (it hasn't been lazy unmounted) - - Abort filesystem through the sysfs interface. Most powerful - method, always works. + - Abort filesystem through the FUSE control filesystem. Most + powerful method, always works. How do non-privileged mounts work? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index c3e1f760cac..72437065f6a 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_FUSE_FS) += fuse.o -fuse-objs := dev.o dir.o file.o inode.o +fuse-objs := dev.o dir.o file.o inode.o control.o diff --git a/fs/fuse/control.c b/fs/fuse/control.c new file mode 100644 index 00000000000..a3bce3a7725 --- /dev/null +++ b/fs/fuse/control.c @@ -0,0 +1,218 @@ +/* + FUSE: Filesystem in Userspace + Copyright (C) 2001-2006 Miklos Szeredi + + This program can be distributed under the terms of the GNU GPL. + See the file COPYING. +*/ + +#include "fuse_i.h" + +#include +#include + +#define FUSE_CTL_SUPER_MAGIC 0x65735543 + +/* + * This is non-NULL when the single instance of the control filesystem + * exists. Protected by fuse_mutex + */ +static struct super_block *fuse_control_sb; + +static struct fuse_conn *fuse_ctl_file_conn_get(struct file *file) +{ + struct fuse_conn *fc; + mutex_lock(&fuse_mutex); + fc = file->f_dentry->d_inode->u.generic_ip; + if (fc) + fc = fuse_conn_get(fc); + mutex_unlock(&fuse_mutex); + return fc; +} + +static ssize_t fuse_conn_abort_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct fuse_conn *fc = fuse_ctl_file_conn_get(file); + if (fc) { + fuse_abort_conn(fc); + fuse_conn_put(fc); + } + return count; +} + +static ssize_t fuse_conn_waiting_read(struct file *file, char __user *buf, + size_t len, loff_t *ppos) +{ + char tmp[32]; + size_t size; + + if (!*ppos) { + struct fuse_conn *fc = fuse_ctl_file_conn_get(file); + if (!fc) + return 0; + + file->private_data=(void *)(long)atomic_read(&fc->num_waiting); + fuse_conn_put(fc); + } + size = sprintf(tmp, "%ld\n", (long)file->private_data); + return simple_read_from_buffer(buf, len, ppos, tmp, size); +} + +static const struct file_operations fuse_ctl_abort_ops = { + .open = nonseekable_open, + .write = fuse_conn_abort_write, +}; + +static const struct file_operations fuse_ctl_waiting_ops = { + .open = nonseekable_open, + .read = fuse_conn_waiting_read, +}; + +static struct dentry *fuse_ctl_add_dentry(struct dentry *parent, + struct fuse_conn *fc, + const char *name, + int mode, int nlink, + struct inode_operations *iop, + const struct file_operations *fop) +{ + struct dentry *dentry; + struct inode *inode; + + BUG_ON(fc->ctl_ndents >= FUSE_CTL_NUM_DENTRIES); + dentry = d_alloc_name(parent, name); + if (!dentry) + return NULL; + + fc->ctl_dentry[fc->ctl_ndents++] = dentry; + inode = new_inode(fuse_control_sb); + if (!inode) + return NULL; + + inode->i_mode = mode; + inode->i_uid = fc->user_id; + inode->i_gid = fc->group_id; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + /* setting ->i_op to NULL is not allowed */ + if (iop) + inode->i_op = iop; + inode->i_fop = fop; + inode->i_nlink = nlink; + inode->u.generic_ip = fc; + d_add(dentry, inode); + return dentry; +} + +/* + * Add a connection to the control filesystem (if it exists). Caller + * must host fuse_mutex + */ +int fuse_ctl_add_conn(struct fuse_conn *fc) +{ + struct dentry *parent; + char name[32]; + + if (!fuse_control_sb) + return 0; + + parent = fuse_control_sb->s_root; + parent->d_inode->i_nlink++; + sprintf(name, "%llu", (unsigned long long) fc->id); + parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2, + &simple_dir_inode_operations, + &simple_dir_operations); + if (!parent) + goto err; + + if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1, + NULL, &fuse_ctl_waiting_ops) || + !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1, + NULL, &fuse_ctl_abort_ops)) + goto err; + + return 0; + + err: + fuse_ctl_remove_conn(fc); + return -ENOMEM; +} + +/* + * Remove a connection from the control filesystem (if it exists). + * Caller must host fuse_mutex + */ +void fuse_ctl_remove_conn(struct fuse_conn *fc) +{ + int i; + + if (!fuse_control_sb) + return; + + for (i = fc->ctl_ndents - 1; i >= 0; i--) { + struct dentry *dentry = fc->ctl_dentry[i]; + dentry->d_inode->u.generic_ip = NULL; + d_drop(dentry); + dput(dentry); + } + fuse_control_sb->s_root->d_inode->i_nlink--; +} + +static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent) +{ + struct tree_descr empty_descr = {""}; + struct fuse_conn *fc; + int err; + + err = simple_fill_super(sb, FUSE_CTL_SUPER_MAGIC, &empty_descr); + if (err) + return err; + + mutex_lock(&fuse_mutex); + BUG_ON(fuse_control_sb); + fuse_control_sb = sb; + list_for_each_entry(fc, &fuse_conn_list, entry) { + err = fuse_ctl_add_conn(fc); + if (err) { + fuse_control_sb = NULL; + mutex_unlock(&fuse_mutex); + return err; + } + } + mutex_unlock(&fuse_mutex); + + return 0; +} + +static int fuse_ctl_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name, void *raw_data, + struct vfsmount *mnt) +{ + return get_sb_single(fs_type, flags, raw_data, + fuse_ctl_fill_super, mnt); +} + +static void fuse_ctl_kill_sb(struct super_block *sb) +{ + mutex_lock(&fuse_mutex); + fuse_control_sb = NULL; + mutex_unlock(&fuse_mutex); + + kill_litter_super(sb); +} + +static struct file_system_type fuse_ctl_fs_type = { + .owner = THIS_MODULE, + .name = "fusectl", + .get_sb = fuse_ctl_get_sb, + .kill_sb = fuse_ctl_kill_sb, +}; + +int __init fuse_ctl_init(void) +{ + return register_filesystem(&fuse_ctl_fs_type); +} + +void fuse_ctl_cleanup(void) +{ + unregister_filesystem(&fuse_ctl_fs_type); +} diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index fec4779e2b5..fe3adf58917 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -833,7 +833,7 @@ static int fuse_dev_release(struct inode *inode, struct file *file) end_requests(fc, &fc->processing); spin_unlock(&fc->lock); fasync_helper(-1, file, 0, &fc->fasync); - kobject_put(&fc->kobj); + fuse_conn_put(fc); } return 0; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 25f8581a770..ac12b01f444 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -14,6 +14,7 @@ #include #include #include +#include /** Max number of pages that can be used in a single read request */ #define FUSE_MAX_PAGES_PER_REQ 32 @@ -24,6 +25,9 @@ /** It could be as large as PATH_MAX, but would that have any uses? */ #define FUSE_NAME_MAX 1024 +/** Number of dentries for each connection in the control filesystem */ +#define FUSE_CTL_NUM_DENTRIES 3 + /** If the FUSE_DEFAULT_PERMISSIONS flag is given, the filesystem module will check permissions based on the file mode. Otherwise no permission checking is done in the kernel */ @@ -33,6 +37,11 @@ doing the mount will be allowed to access the filesystem */ #define FUSE_ALLOW_OTHER (1 << 1) +/** List of active connections */ +extern struct list_head fuse_conn_list; + +/** Global mutex protecting fuse_conn_list and the control filesystem */ +extern struct mutex fuse_mutex; /** FUSE inode */ struct fuse_inode { @@ -216,6 +225,9 @@ struct fuse_conn { /** Lock protecting accessess to members of this structure */ spinlock_t lock; + /** Refcount */ + atomic_t count; + /** The user id for this mount */ uid_t user_id; @@ -310,8 +322,17 @@ struct fuse_conn { /** Backing dev info */ struct backing_dev_info bdi; - /** kobject */ - struct kobject kobj; + /** Entry on the fuse_conn_list */ + struct list_head entry; + + /** Unique ID */ + u64 id; + + /** Dentries in the control filesystem */ + struct dentry *ctl_dentry[FUSE_CTL_NUM_DENTRIES]; + + /** number of dentries used in the above array */ + int ctl_ndents; /** O_ASYNC requests */ struct fasync_struct *fasync; @@ -327,11 +348,6 @@ static inline struct fuse_conn *get_fuse_conn(struct inode *inode) return get_fuse_conn_super(inode->i_sb); } -static inline struct fuse_conn *get_fuse_conn_kobj(struct kobject *obj) -{ - return container_of(obj, struct fuse_conn, kobj); -} - static inline struct fuse_inode *get_fuse_inode(struct inode *inode) { return container_of(inode, struct fuse_inode, inode); @@ -422,6 +438,9 @@ int fuse_dev_init(void); */ void fuse_dev_cleanup(void); +int fuse_ctl_init(void); +void fuse_ctl_cleanup(void); + /** * Allocate a request */ @@ -470,3 +489,23 @@ int fuse_do_getattr(struct inode *inode); * Invalidate inode attributes */ void fuse_invalidate_attr(struct inode *inode); + +/** + * Acquire reference to fuse_conn + */ +struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); + +/** + * Release reference to fuse_conn + */ +void fuse_conn_put(struct fuse_conn *fc); + +/** + * Add connection to control filesystem + */ +int fuse_ctl_add_conn(struct fuse_conn *fc); + +/** + * Remove connection from control filesystem + */ +void fuse_ctl_remove_conn(struct fuse_conn *fc); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 0225729977c..13a7e8ab7a7 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -22,13 +22,8 @@ MODULE_DESCRIPTION("Filesystem in Userspace"); MODULE_LICENSE("GPL"); static kmem_cache_t *fuse_inode_cachep; -static struct subsystem connections_subsys; - -struct fuse_conn_attr { - struct attribute attr; - ssize_t (*show)(struct fuse_conn *, char *); - ssize_t (*store)(struct fuse_conn *, const char *, size_t); -}; +struct list_head fuse_conn_list; +DEFINE_MUTEX(fuse_mutex); #define FUSE_SUPER_MAGIC 0x65735546 @@ -211,8 +206,11 @@ static void fuse_put_super(struct super_block *sb) kill_fasync(&fc->fasync, SIGIO, POLL_IN); wake_up_all(&fc->waitq); wake_up_all(&fc->blocked_waitq); - kobject_del(&fc->kobj); - kobject_put(&fc->kobj); + mutex_lock(&fuse_mutex); + list_del(&fc->entry); + fuse_ctl_remove_conn(fc); + mutex_unlock(&fuse_mutex); + fuse_conn_put(fc); } static void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr) @@ -362,11 +360,6 @@ static int fuse_show_options(struct seq_file *m, struct vfsmount *mnt) return 0; } -static void fuse_conn_release(struct kobject *kobj) -{ - kfree(get_fuse_conn_kobj(kobj)); -} - static struct fuse_conn *new_conn(void) { struct fuse_conn *fc; @@ -374,13 +367,12 @@ static struct fuse_conn *new_conn(void) fc = kzalloc(sizeof(*fc), GFP_KERNEL); if (fc) { spin_lock_init(&fc->lock); + atomic_set(&fc->count, 1); init_waitqueue_head(&fc->waitq); init_waitqueue_head(&fc->blocked_waitq); INIT_LIST_HEAD(&fc->pending); INIT_LIST_HEAD(&fc->processing); INIT_LIST_HEAD(&fc->io); - kobj_set_kset_s(fc, connections_subsys); - kobject_init(&fc->kobj); atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; @@ -390,6 +382,18 @@ static struct fuse_conn *new_conn(void) return fc; } +void fuse_conn_put(struct fuse_conn *fc) +{ + if (atomic_dec_and_test(&fc->count)) + kfree(fc); +} + +struct fuse_conn *fuse_conn_get(struct fuse_conn *fc) +{ + atomic_inc(&fc->count); + return fc; +} + static struct inode *get_root_inode(struct super_block *sb, unsigned mode) { struct fuse_attr attr; @@ -459,10 +463,9 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) request_send_background(fc, req); } -static unsigned long long conn_id(void) +static u64 conn_id(void) { - /* BKL is held for ->get_sb() */ - static unsigned long long ctr = 1; + static u64 ctr = 1; return ctr++; } @@ -519,24 +522,21 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) if (!init_req) goto err_put_root; - err = kobject_set_name(&fc->kobj, "%llu", conn_id()); - if (err) - goto err_free_req; - - err = kobject_add(&fc->kobj); - if (err) - goto err_free_req; - - /* Setting file->private_data can't race with other mount() - instances, since BKL is held for ->get_sb() */ + mutex_lock(&fuse_mutex); err = -EINVAL; if (file->private_data) - goto err_kobject_del; + goto err_unlock; + fc->id = conn_id(); + err = fuse_ctl_add_conn(fc); + if (err) + goto err_unlock; + + list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; fc->connected = 1; - kobject_get(&fc->kobj); - file->private_data = fc; + file->private_data = fuse_conn_get(fc); + mutex_unlock(&fuse_mutex); /* * atomic_dec_and_test() in fput() provides the necessary * memory barrier for file->private_data to be visible on all @@ -548,15 +548,14 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) return 0; - err_kobject_del: - kobject_del(&fc->kobj); - err_free_req: + err_unlock: + mutex_unlock(&fuse_mutex); fuse_request_free(init_req); err_put_root: dput(root_dentry); err: fput(file); - kobject_put(&fc->kobj); + fuse_conn_put(fc); return err; } @@ -574,68 +573,8 @@ static struct file_system_type fuse_fs_type = { .kill_sb = kill_anon_super, }; -static ssize_t fuse_conn_waiting_show(struct fuse_conn *fc, char *page) -{ - return sprintf(page, "%i\n", atomic_read(&fc->num_waiting)); -} - -static ssize_t fuse_conn_abort_store(struct fuse_conn *fc, const char *page, - size_t count) -{ - fuse_abort_conn(fc); - return count; -} - -static struct fuse_conn_attr fuse_conn_waiting = - __ATTR(waiting, 0400, fuse_conn_waiting_show, NULL); -static struct fuse_conn_attr fuse_conn_abort = - __ATTR(abort, 0600, NULL, fuse_conn_abort_store); - -static struct attribute *fuse_conn_attrs[] = { - &fuse_conn_waiting.attr, - &fuse_conn_abort.attr, - NULL, -}; - -static ssize_t fuse_conn_attr_show(struct kobject *kobj, - struct attribute *attr, - char *page) -{ - struct fuse_conn_attr *fca = - container_of(attr, struct fuse_conn_attr, attr); - - if (fca->show) - return fca->show(get_fuse_conn_kobj(kobj), page); - else - return -EACCES; -} - -static ssize_t fuse_conn_attr_store(struct kobject *kobj, - struct attribute *attr, - const char *page, size_t count) -{ - struct fuse_conn_attr *fca = - container_of(attr, struct fuse_conn_attr, attr); - - if (fca->store) - return fca->store(get_fuse_conn_kobj(kobj), page, count); - else - return -EACCES; -} - -static struct sysfs_ops fuse_conn_sysfs_ops = { - .show = &fuse_conn_attr_show, - .store = &fuse_conn_attr_store, -}; - -static struct kobj_type ktype_fuse_conn = { - .release = fuse_conn_release, - .sysfs_ops = &fuse_conn_sysfs_ops, - .default_attrs = fuse_conn_attrs, -}; - static decl_subsys(fuse, NULL, NULL); -static decl_subsys(connections, &ktype_fuse_conn, NULL); +static decl_subsys(connections, NULL, NULL); static void fuse_inode_init_once(void *foo, kmem_cache_t *cachep, unsigned long flags) @@ -709,6 +648,7 @@ static int __init fuse_init(void) printk("fuse init (API version %i.%i)\n", FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); + INIT_LIST_HEAD(&fuse_conn_list); res = fuse_fs_init(); if (res) goto err; @@ -721,8 +661,14 @@ static int __init fuse_init(void) if (res) goto err_dev_cleanup; + res = fuse_ctl_init(); + if (res) + goto err_sysfs_cleanup; + return 0; + err_sysfs_cleanup: + fuse_sysfs_cleanup(); err_dev_cleanup: fuse_dev_cleanup(); err_fs_cleanup: @@ -735,6 +681,7 @@ static void __exit fuse_exit(void) { printk(KERN_DEBUG "fuse exit\n"); + fuse_ctl_cleanup(); fuse_sysfs_cleanup(); fuse_fs_cleanup(); fuse_dev_cleanup(); -- cgit v1.2.3-70-g09d2 From a4d27e75ffb7b8ecb7eed0c7db0df975525f3fd7 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sun, 25 Jun 2006 05:48:54 -0700 Subject: [PATCH] fuse: add request interruption Add synchronous request interruption. This is needed for file locking operations which have to be interruptible. However filesystem may implement interruptibility of other operations (e.g. like NFS 'intr' mount option). Signed-off-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/fuse.txt | 48 ++++++++++- fs/fuse/dev.c | 162 ++++++++++++++++++++++++++++++------- fs/fuse/file.c | 3 + fs/fuse/fuse_i.h | 16 ++++ fs/fuse/inode.c | 1 + include/linux/fuse.h | 7 +- 6 files changed, 205 insertions(+), 32 deletions(-) (limited to 'Documentation/filesystems/fuse.txt') diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 324df27704c..a584f05403a 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt @@ -124,6 +124,46 @@ For each connection the following files exist within this directory: Only the owner of the mount may read or write these files. +Interrupting filesystem operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a process issuing a FUSE filesystem request is interrupted, the +following will happen: + + 1) If the request is not yet sent to userspace AND the signal is + fatal (SIGKILL or unhandled fatal signal), then the request is + dequeued and returns immediately. + + 2) If the request is not yet sent to userspace AND the signal is not + fatal, then an 'interrupted' flag is set for the request. When + the request has been successfully transfered to userspace and + this flag is set, an INTERRUPT request is queued. + + 3) If the request is already sent to userspace, then an INTERRUPT + request is queued. + +INTERRUPT requests take precedence over other requests, so the +userspace filesystem will receive queued INTERRUPTs before any others. + +The userspace filesystem may ignore the INTERRUPT requests entirely, +or may honor them by sending a reply to the _original_ request, with +the error set to EINTR. + +It is also possible that there's a race between processing the +original request and it's INTERRUPT request. There are two possibilities: + + 1) The INTERRUPT request is processed before the original request is + processed + + 2) The INTERRUPT request is processed after the original request has + been answered + +If the filesystem cannot find the original request, it should wait for +some timeout and/or a number of new requests to arrive, after which it +should reply to the INTERRUPT request with an EAGAIN error. In case +1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT +reply will be ignored. + Aborting a filesystem connection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -351,10 +391,10 @@ but is caused by a pagefault. Solution is basically the same as above. -An additional problem is that while the write buffer is being -copied to the request, the request must not be interrupted. This -is because the destination address of the copy may not be valid -after the request is interrupted. +An additional problem is that while the write buffer is being copied +to the request, the request must not be interrupted/aborted. This is +because the destination address of the copy may not be valid after the +request has returned. This is solved with doing the copy atomically, and allowing abort while the page(s) belonging to the write buffer are faulted with diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 6b5f74cb7b5..1e2006caf15 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -34,6 +34,7 @@ static void fuse_request_init(struct fuse_req *req) { memset(req, 0, sizeof(*req)); INIT_LIST_HEAD(&req->list); + INIT_LIST_HEAD(&req->intr_entry); init_waitqueue_head(&req->waitq); atomic_set(&req->count, 1); } @@ -215,6 +216,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) void (*end) (struct fuse_conn *, struct fuse_req *) = req->end; req->end = NULL; list_del(&req->list); + list_del(&req->intr_entry); req->state = FUSE_REQ_FINISHED; if (req->background) { if (fc->num_background == FUSE_MAX_BACKGROUND) { @@ -235,28 +237,63 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) fuse_put_request(fc, req); } +static void wait_answer_interruptible(struct fuse_conn *fc, + struct fuse_req *req) +{ + if (signal_pending(current)) + return; + + spin_unlock(&fc->lock); + wait_event_interruptible(req->waitq, req->state == FUSE_REQ_FINISHED); + spin_lock(&fc->lock); +} + +static void queue_interrupt(struct fuse_conn *fc, struct fuse_req *req) +{ + list_add_tail(&req->intr_entry, &fc->interrupts); + wake_up(&fc->waitq); + kill_fasync(&fc->fasync, SIGIO, POLL_IN); +} + /* Called with fc->lock held. Releases, and then reacquires it. */ static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req) { - sigset_t oldset; + if (!fc->no_interrupt) { + /* Any signal may interrupt this */ + wait_answer_interruptible(fc, req); - spin_unlock(&fc->lock); - if (req->force) + if (req->aborted) + goto aborted; + if (req->state == FUSE_REQ_FINISHED) + return; + + req->interrupted = 1; + if (req->state == FUSE_REQ_SENT) + queue_interrupt(fc, req); + } + + if (req->force) { + spin_unlock(&fc->lock); wait_event(req->waitq, req->state == FUSE_REQ_FINISHED); - else { + spin_lock(&fc->lock); + } else { + sigset_t oldset; + + /* Only fatal signals may interrupt this */ block_sigs(&oldset); - wait_event_interruptible(req->waitq, - req->state == FUSE_REQ_FINISHED); + wait_answer_interruptible(fc, req); restore_sigs(&oldset); } - spin_lock(&fc->lock); - if (req->state == FUSE_REQ_FINISHED && !req->aborted) - return; - if (!req->aborted) { - req->out.h.error = -EINTR; - req->aborted = 1; - } + if (req->aborted) + goto aborted; + if (req->state == FUSE_REQ_FINISHED) + return; + + req->out.h.error = -EINTR; + req->aborted = 1; + + aborted: if (req->locked) { /* This is uninterruptible sleep, because data is being copied to/from the buffers of req. During @@ -288,13 +325,19 @@ static unsigned len_args(unsigned numargs, struct fuse_arg *args) return nbytes; } +static u64 fuse_get_unique(struct fuse_conn *fc) + { + fc->reqctr++; + /* zero is special */ + if (fc->reqctr == 0) + fc->reqctr = 1; + + return fc->reqctr; +} + static void queue_request(struct fuse_conn *fc, struct fuse_req *req) { - fc->reqctr++; - /* zero is special */ - if (fc->reqctr == 0) - fc->reqctr = 1; - req->in.h.unique = fc->reqctr; + req->in.h.unique = fuse_get_unique(fc); req->in.h.len = sizeof(struct fuse_in_header) + len_args(req->in.numargs, (struct fuse_arg *) req->in.args); list_add_tail(&req->list, &fc->pending); @@ -307,9 +350,6 @@ static void queue_request(struct fuse_conn *fc, struct fuse_req *req) kill_fasync(&fc->fasync, SIGIO, POLL_IN); } -/* - * This can only be interrupted by a SIGKILL - */ void request_send(struct fuse_conn *fc, struct fuse_req *req) { req->isreply = 1; @@ -566,13 +606,18 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, return err; } +static int request_pending(struct fuse_conn *fc) +{ + return !list_empty(&fc->pending) || !list_empty(&fc->interrupts); +} + /* Wait until a request is available on the pending list */ static void request_wait(struct fuse_conn *fc) { DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(&fc->waitq, &wait); - while (fc->connected && list_empty(&fc->pending)) { + while (fc->connected && !request_pending(fc)) { set_current_state(TASK_INTERRUPTIBLE); if (signal_pending(current)) break; @@ -585,6 +630,45 @@ static void request_wait(struct fuse_conn *fc) remove_wait_queue(&fc->waitq, &wait); } +/* + * Transfer an interrupt request to userspace + * + * Unlike other requests this is assembled on demand, without a need + * to allocate a separate fuse_req structure. + * + * Called with fc->lock held, releases it + */ +static int fuse_read_interrupt(struct fuse_conn *fc, struct fuse_req *req, + const struct iovec *iov, unsigned long nr_segs) +{ + struct fuse_copy_state cs; + struct fuse_in_header ih; + struct fuse_interrupt_in arg; + unsigned reqsize = sizeof(ih) + sizeof(arg); + int err; + + list_del_init(&req->intr_entry); + req->intr_unique = fuse_get_unique(fc); + memset(&ih, 0, sizeof(ih)); + memset(&arg, 0, sizeof(arg)); + ih.len = reqsize; + ih.opcode = FUSE_INTERRUPT; + ih.unique = req->intr_unique; + arg.unique = req->in.h.unique; + + spin_unlock(&fc->lock); + if (iov_length(iov, nr_segs) < reqsize) + return -EINVAL; + + fuse_copy_init(&cs, fc, 1, NULL, iov, nr_segs); + err = fuse_copy_one(&cs, &ih, sizeof(ih)); + if (!err) + err = fuse_copy_one(&cs, &arg, sizeof(arg)); + fuse_copy_finish(&cs); + + return err ? err : reqsize; +} + /* * Read a single request into the userspace filesystem's buffer. This * function waits until a request is available, then removes it from @@ -610,7 +694,7 @@ static ssize_t fuse_dev_readv(struct file *file, const struct iovec *iov, spin_lock(&fc->lock); err = -EAGAIN; if ((file->f_flags & O_NONBLOCK) && fc->connected && - list_empty(&fc->pending)) + !request_pending(fc)) goto err_unlock; request_wait(fc); @@ -618,9 +702,15 @@ static ssize_t fuse_dev_readv(struct file *file, const struct iovec *iov, if (!fc->connected) goto err_unlock; err = -ERESTARTSYS; - if (list_empty(&fc->pending)) + if (!request_pending(fc)) goto err_unlock; + if (!list_empty(&fc->interrupts)) { + req = list_entry(fc->interrupts.next, struct fuse_req, + intr_entry); + return fuse_read_interrupt(fc, req, iov, nr_segs); + } + req = list_entry(fc->pending.next, struct fuse_req, list); req->state = FUSE_REQ_READING; list_move(&req->list, &fc->io); @@ -658,6 +748,8 @@ static ssize_t fuse_dev_readv(struct file *file, const struct iovec *iov, else { req->state = FUSE_REQ_SENT; list_move_tail(&req->list, &fc->processing); + if (req->interrupted) + queue_interrupt(fc, req); spin_unlock(&fc->lock); } return reqsize; @@ -684,7 +776,7 @@ static struct fuse_req *request_find(struct fuse_conn *fc, u64 unique) list_for_each(entry, &fc->processing) { struct fuse_req *req; req = list_entry(entry, struct fuse_req, list); - if (req->in.h.unique == unique) + if (req->in.h.unique == unique || req->intr_unique == unique) return req; } return NULL; @@ -750,7 +842,6 @@ static ssize_t fuse_dev_writev(struct file *file, const struct iovec *iov, goto err_unlock; req = request_find(fc, oh.unique); - err = -EINVAL; if (!req) goto err_unlock; @@ -761,6 +852,23 @@ static ssize_t fuse_dev_writev(struct file *file, const struct iovec *iov, request_end(fc, req); return -ENOENT; } + /* Is it an interrupt reply? */ + if (req->intr_unique == oh.unique) { + err = -EINVAL; + if (nbytes != sizeof(struct fuse_out_header)) + goto err_unlock; + + if (oh.error == -ENOSYS) + fc->no_interrupt = 1; + else if (oh.error == -EAGAIN) + queue_interrupt(fc, req); + + spin_unlock(&fc->lock); + fuse_copy_finish(&cs); + return nbytes; + } + + req->state = FUSE_REQ_WRITING; list_move(&req->list, &fc->io); req->out.h = oh; req->locked = 1; @@ -809,7 +917,7 @@ static unsigned fuse_dev_poll(struct file *file, poll_table *wait) spin_lock(&fc->lock); if (!fc->connected) mask = POLLERR; - else if (!list_empty(&fc->pending)) + else if (request_pending(fc)) mask |= POLLIN | POLLRDNORM; spin_unlock(&fc->lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ce759414cff..36f92f181d2 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -705,6 +705,9 @@ static int fuse_setlk(struct file *file, struct file_lock *fl) fuse_lk_fill(req, file, fl, opcode, pid); request_send(fc, req); err = req->out.h.error; + /* locking is restartable */ + if (err == -EINTR) + err = -ERESTARTSYS; fuse_put_request(fc, req); return err; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index fd65e75e162..c862df58da9 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -131,6 +131,7 @@ enum fuse_req_state { FUSE_REQ_PENDING, FUSE_REQ_READING, FUSE_REQ_SENT, + FUSE_REQ_WRITING, FUSE_REQ_FINISHED }; @@ -144,9 +145,15 @@ struct fuse_req { fuse_conn */ struct list_head list; + /** Entry on the interrupts list */ + struct list_head intr_entry; + /** refcount */ atomic_t count; + /** Unique ID for the interrupt request */ + u64 intr_unique; + /* * The following bitfields are either set once before the * request is queued or setting/clearing them is protected by @@ -165,6 +172,9 @@ struct fuse_req { /** Request is sent in the background */ unsigned background:1; + /** The request has been interrupted */ + unsigned interrupted:1; + /** Data is being copied to/from the request */ unsigned locked:1; @@ -262,6 +272,9 @@ struct fuse_conn { /** Number of requests currently in the background */ unsigned num_background; + /** Pending interrupts */ + struct list_head interrupts; + /** Flag indicating if connection is blocked. This will be the case before the INIT reply is received, and if there are too many outstading backgrounds requests */ @@ -320,6 +333,9 @@ struct fuse_conn { /** Is create not implemented by fs? */ unsigned no_create : 1; + /** Is interrupt not implemented by fs? */ + unsigned no_interrupt : 1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 41289290583..e21ef8a3ad3 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -381,6 +381,7 @@ static struct fuse_conn *new_conn(void) INIT_LIST_HEAD(&fc->pending); INIT_LIST_HEAD(&fc->processing); INIT_LIST_HEAD(&fc->io); + INIT_LIST_HEAD(&fc->interrupts); atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; diff --git a/include/linux/fuse.h b/include/linux/fuse.h index e7a76ec0f05..9fc48a674b8 100644 --- a/include/linux/fuse.h +++ b/include/linux/fuse.h @@ -125,7 +125,8 @@ enum fuse_opcode { FUSE_SETLK = 32, FUSE_SETLKW = 33, FUSE_ACCESS = 34, - FUSE_CREATE = 35 + FUSE_CREATE = 35, + FUSE_INTERRUPT = 36, }; /* The read buffer is required to be at least 8k, but may be much larger */ @@ -291,6 +292,10 @@ struct fuse_init_out { __u32 max_write; }; +struct fuse_interrupt_in { + __u64 unique; +}; + struct fuse_in_header { __u32 len; __u32 opcode; -- cgit v1.2.3-70-g09d2