summaryrefslogtreecommitdiffstats
path: root/fs/inotify.c
AgeCommit message (Collapse)Author
2008-11-15Fix inotify watch removal/umount racesAl Viro
Inotify watch removals suck violently. To kick the watch out we need (in this order) inode->inotify_mutex and ih->mutex. That's fine if we have a hold on inode; however, for all other cases we need to make damn sure we don't race with umount. We can *NOT* just grab a reference to a watch - inotify_unmount_inodes() will happily sail past it and we'll end with reference to inode potentially outliving its superblock. Ideally we just want to grab an active reference to superblock if we can; that will make sure we won't go into inotify_umount_inodes() until we are done. Cleanup is just deactivate_super(). However, that leaves a messy case - what if we *are* racing with umount() and active references to superblock can't be acquired anymore? We can bump ->s_count, grab ->s_umount, which will almost certainly wait until the superblock is shut down and the watch in question is pining for fjords. That's fine, but there is a problem - we might have hit the window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock is past the point of no return and is heading for shutdown) and the moment when deactivate_super() acquires ->s_umount. We could just do drop_super() yield() and retry, but that's rather antisocial and this stuff is luser-triggerable. OTOH, having grabbed ->s_umount and having found that we'd got there first (i.e. that ->s_root is non-NULL) we know that we won't race with inotify_umount_inodes(). So we could grab a reference to watch and do the rest as above, just with drop_super() instead of deactivate_super(), right? Wrong. We had to drop ih->mutex before we could grab ->s_umount. So the watch could've been gone already. That still can be dealt with - we need to save watch->wd, do idr_find() and compare its result with our pointer. If they match, we either have the damn thing still alive or we'd lost not one but two races at once, the watch had been killed and a new one got created with the same ->wd at the same address. That couldn't have happened in inotify_destroy(), but inotify_rm_wd() could run into that. Still, "new one got created" is not a problem - we have every right to kill it or leave it alone, whatever's more convenient. So we can use idr_find(...) == watch && watch->inode->i_sb == sb as "grab it and kill it" check. If it's been our original watch, we are fine, if it's a newcomer - nevermind, just pretend that we'd won the race and kill the fscker anyway; we are safe since we know that its superblock won't be going away. And yes, this is far beyond mere "not very pretty"; so's the entire concept of inotify to start with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06inotify: remove debug codeNick Piggin
The inotify debugging code is supposed to verify that the DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in notifications getting lost nor extra needless locking generated. Unfortunately there are also some races in the debugging code. And it isn't very good at finding problems anyway. So remove it for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06inotify: fix raceNick Piggin
There is a race between setting an inode's children's "parent watched" flag when placing the first watch on a parent, and instantiating new children of that parent: a child could miss having its flags set by set_dentry_child_flags, but then inotify_d_instantiate might still see !inotify_inode_watched. The solution is to set_dentry_child_flags after adding the watch. Locking is taken care of, because both set_dentry_child_flags and inotify_d_instantiate hold dcache_lock and child->d_locks. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-21[PATCH] new helper - inotify_evict_watch()Al Viro
Kicks the watch out without dropping it. Called under ->inotify_mutex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-21[PATCH] new helper - inotify_clone_watch()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-05-08Introduce a handy list_first_entry macroPavel Emelianov
There are many places in the kernel where the construction like foo = list_entry(head->next, struct foo_struct, list); are used. The code might look more descriptive and neat if using the macro list_first_entry(head, type, member) \ list_entry((head)->next, type, member) Here is the macro itself and the examples of its usage in the generic code. If it will turn out to be useful, I can prepare the set of patches to inject in into arch-specific code, drivers, networking, etc. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-04[PATCH] severing fs.h, radix-tree.h -> sched.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (4/5): allow watch removal from event handlerAmy Griffis
Allow callers to remove watches from their event handler via inotify_remove_watch_locked(). This functionality can be used to achieve IN_ONESHOT-like functionality for a subset of events in the mask. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (3/5): add interfaces to kernel APIAmy Griffis
Add inotify_init_watch() so caller can use inotify_watch refcounts before calling inotify_add_watch(). Add inotify_find_watch() to find an existing watch for an (ih,inode) pair. This is similar to inotify_find_update_watch(), but does not update the watch's mask if one is found. Add inotify_rm_watch() to remove a watch via the watch pointer instead of the watch descriptor. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (2/5): add name's inode to event handlerAmy Griffis
When an inotify event includes a dentry name, also include the inode associated with that name. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (1/5): split kernel API from userspace supportAmy Griffis
The following series of patches introduces a kernel API for inotify, making it possible for kernel modules to benefit from inotify's mechanism for watching inodes. With these patches, inotify will maintain for each caller a list of watches (via an embedded struct inotify_watch), where each inotify_watch is associated with a corresponding struct inode. The caller registers an event handler and specifies for which filesystem events their event handler should be called per inotify_watch. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-05-21[PATCH] fix NULL dereference in inotify_ignoreAmy Griffis
Don't reassign to watch. If idr_find() returns NULL, then put_inotify_watch() will choke. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] fix race in inotify_releaseAmy Griffis
While doing some inotify stress testing, I hit the following race. In inotify_release(), it's possible for a watch to be removed from the lists in between dropping dev->mutex and taking inode->inotify_mutex. The reference we hold prevents the watch from being freed, but not from being removed. Checking the dev's idr mapping will prevent a double list_del of the same watch. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11[PATCH] inotify: check for NULL inode in inotify_d_instantiateArnd Bergmann
The spufs file system creates files in a directory before instantiating the directory itself, which causes a NULL pointer access in inotify_d_instantiate since c32ccd87bfd1414b0aabfcd8dbc7539ad23bcbaa. I'd like to keep this behavior since it means that the user will not have access to files in the directory before I know that I succeed in creating everything in it. This patch adds a simple check for the inode to keep that working. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28[PATCH] Make most file operations structs in fs/ constArjan van de Ven
This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Use __read_mostly on some hot fs variablesEric Dumazet
I discovered on oprofile hunting on a SMP platform that dentry lookups were slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in a cache line that contained inodes_stat. So each time inodes_stats is changed by a cpu, other cpus have to refill their cache line. This patch moves some variables to the __read_mostly section, in order to avoid false sharing. RCU dentry lookups can go full speed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] inotify: lock avoidance with parent watch status in dentryNick Piggin
Previous inotify work avoidance is good when inotify is completely unused, but it breaks down if even a single watch is in place anywhere in the system. Robin Holt notices that udev is one such culprit - it slows down a 512-thread application on a 512 CPU system from 6 seconds to 22 minutes. Solve this by adding a flag in the dentry that tells inotify whether or not its parent inode has a watch on it. Event queueing to parent will skip taking locks if this flag is cleared. Setting and clearing of this flag on all child dentries versus event delivery: this is no in terms of race cases, and that was shown to be equivalent to always performing the check. The essential behaviour is that activity occuring _after_ a watch has been added and _before_ it has been removed, will generate events. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] sem2mutex: ipruneIngo Molnar
Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] sem2mutex: inotifyIngo Molnar
Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Acked-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07[PATCH] inotify: fix one-shot supportRobert Love
Fix one-shot support in inotify. We currently drop the IN_ONESHOT flag during watch addition. Fix is to not do that. Signed-off-by: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18[PATCH] add missing syscall declarationsArnd Bergmann
All standard system calls should be declared in include/linux/syscalls.h. Add some of the new additions that were previously missed. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] inotify: add two inotify_add_watch flagsJohn McCutchan
The below patch lets userspace have more control over the inodes that inotify will watch. It introduces two new flags. IN_ONLYDIR -- only watch the inode if it is a directory. This is needed to avoid the race that can occur when we want to be sure that we are watching a directory. IN_DONT_FOLLOW -- don't follow a symlink. In combination with IN_ONLYDIR we can make sure that we don't watch the target of symlinks. The issues the flags fix came up when writing the gnome-vfs inotify backend. Default behaviour is unchanged. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Acked-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] add a vfs_permission helperChristoph Hellwig
Most permission() calls have a struct nameidata * available. This helper takes that as an argument and thus makes sure we pass it down for lookup intents and prepares for per-mount read-only support where we need a struct vfsmount for checking whether a file is writeable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-23[PATCH] inotify/idr leak fixAndrew Morton
Fix a bug which was reported and diagnosed by Stefan Jones <stefan.jones@churchillrandoms.co.uk> IDR trees include a cache of idr_layer objects. There's no way to destroy this cache, so when we discard an overall idr tree we end up leaking some memory. Add and use idr_destroy() for this. v9fs and infiniband also need to use idr_destroy() to avoid leaks. Or, we make the cache global, like radix_tree_preload(). Which is probably better. Later. Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] inotify: fix event loss on hardlinked filesJohn McCutchan
People have run into a problem when they do this: watch (file1, all_events); watch (file2, some_events); if file2 is a hard link to file1, some events will be missed because by default we replace the mask. The patch below adds a flag IN_MASK_ADD which will cause inotify to add to the existing mask if present. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] inotify speedupJohn McCutchan
Bypass an inotify-related fastpath spinlock and several function calls on systems which have no inotify watches registered. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-26[PATCH] Document idr_get_new_above() semantics, update inotifyJohn McCutchan
There is an off by one problem with idr_get_new_above. The comment and function name suggest that it will return an id > starting_id, but it actually returned an id >= starting_id, and kernel callers other than inotify treated it as such. The patch below fixes the comment, and fixes inotifys usage. The function name still doesn't match the behaviour, but it never did. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-15[PATCH] inotify: fix idr_get_new_above usageRobert Love
We are saving the wrong thing in ->last_wd. We want the wd, not the return value. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01[PATCH] inotify: fix race between the kernel and user spaceJohn McCutchan
When you rm a watch, an IN_IGNORED event is sent down the event queue with the watch descriptor that you just rm'd. If you then add a watch you could get the ignored watch's wd and if you haven't read the entire event queue, user space will think that it's newly created watch was just ignored. To avoid this problem we just use idr_get_new_above instead of idr_get_new. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: fix oops fixAndrew Morton
Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: check retval in initRobert Love
Check for (unlikely) errors in the filesystem initialization stuff in our module_init() function. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: change default limitsRobert Love
Change default inotify limits: Maximum instances per user to 128 and maximum events per queue to 16k. The max instances used to be 128; the change to 8 was a mistake. Memory consumption is fine. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: exit path cleanupsRobert Love
Handle error out paths better. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: oops fixRobert Love
Bug fix: Ensure that the fd passed to inotify_add_watch() and inotify_rm_watch() belongs to inotify. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: use fget_lightRobert Love
As an optimization, use fget_light() and fput_light() where possible. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26[PATCH] inotify: misc. cleanupRobert Love
Miscellaneous invariant clean up, comment fixes, and so on. Trivial stuff. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13[PATCH] inotify: misc cleanupRobert Love
Really simple, basic cleanup. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13[PATCH] inotify: move sysctlRobert Love
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from "/proc/sys/fs". Also some related cleanup. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12[PATCH] inotifyRobert Love
inotify is intended to correct the deficiencies of dnotify, particularly its inability to scale and its terrible user interface: * dnotify requires the opening of one fd per each directory that you intend to watch. This quickly results in too many open files and pins removable media, preventing unmount. * dnotify is directory-based. You only learn about changes to directories. Sure, a change to a file in a directory affects the directory, but you are then forced to keep a cache of stat structures. * dnotify's interface to user-space is awful. Signals? inotify provides a more usable, simple, powerful solution to file change notification: * inotify's interface is a system call that returns a fd, not SIGIO. You get a single fd, which is select()-able. * inotify has an event that says "the filesystem that the item you were watching is on was unmounted." * inotify can watch directories or files. Inotify is currently used by Beagle (a desktop search infrastructure), Gamin (a FAM replacement), and other projects. See Documentation/filesystems/inotify.txt. Signed-off-by: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>