summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2010-10-28fanotify: ignore fanotify ignore marks if open writersEric Paris
fanotify will clear ignore marks if a task changes the contents of an inode. The problem is with the races around when userspace finishes checking a file and when that result is actually attached to the inode. This race was described as such: Consider the following scenario with hostile processes A and B, and victim process C: 1. Process A opens new file for writing. File check request is generated. 2. File check is performed in userspace. Check result is "file has no malware". 3. The "permit" response is delivered to kernel space. 4. File ignored mark set. 5. Process A writes dummy bytes to the file. File ignored flags are cleared. 6. Process B opens the same file for reading. File check request is generated. 7. File check is performed in userspace. Check result is "file has no malware". 8. Process A writes malware bytes to the file. There is no cached response yet. 9. The "permit" response is delivered to kernel space and is cached in fanotify. 10. File ignored mark set. 11. Now any process C will be permitted to open the malware file. There is a race between steps 8 and 10 While fanotify makes no strong guarantees about systems with hostile processes there is no reason we cannot harden against this race. We do that by simply ignoring any ignore marks if the inode has open writers (aka i_writecount > 0). (We actually do not ignore ignore marks if the FAN_MARK_SURV_MODIFY flag is set) Reported-by: Vasily Novikov <vasily.novikov@kaspersky.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fsnotify: call fsnotify_parent in perm eventsEric Paris
fsnotify perm events do not call fsnotify parent. That means you cannot register a perm event on a directory and enforce permissions on all inodes in that directory. This patch fixes that situation. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fsnotify: correctly handle return codes from listenersEric Paris
When fsnotify groups return errors they are ignored. For permissions events these should be passed back up the stack, but for most events these should continue to be ignored. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: implement fanotify listener orderingEric Paris
The fanotify listeners needs to be able to specify what types of operations they are going to perform so they can be ordered appropriately between other listeners doing other types of operations. They need this to be able to make sure that things like hierarchichal storage managers will get access to inodes before processes which need the data. This patch defines 3 possible uses which groups must indicate in the fanotify_init() flags. FAN_CLASS_PRE_CONTENT FAN_CLASS_CONTENT FAN_CLASS_NOTIF Groups will receive notification in that order. The order between 2 groups in the same class is undeterministic. FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to the inode before they are certain that the inode contains it's final data. A hierarchical storage manager should choose to use this class. FAN_CLASS_CONTENT is intended to be used by listeners which need access to the inode after it contains its intended contents. This would be the appropriate level for an AV solution or document control system. FAN_CLASS_NOTIF is intended for normal async notification about access, much the same as inotify and dnotify. Syncronous permissions events are not permitted at this class. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fsnotify: implement ordering between notifiersEric Paris
fanotify needs to be able to specify that some groups get events before others. They use this idea to make sure that a hierarchical storage manager gets access to files before programs which actually use them. This is purely infrastructure. Everything will have a priority of 0, but the infrastructure will exist for it to be non-zero. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: allow fanotify to be builtEric Paris
We disabled the ability to build fanotify in commit 7c5347733dcc4ba0ba. This reverts that commit and allows people to build fanotify. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (841 commits) Staging: brcm80211: fix usage of roundup in structures Staging: bcm: fix up network device reference counting Staging: keucr: fix up US_ macro change staging: brcm80211: brcmfmac: Removed codeversion from firmware filenames. staging: brcm80211: Remove unnecessary header files. staging: brcm80211: Remove unnecessary includes from bcmutils.c staging: brcm80211: Removed unnecessary pktsetprio() function. Staging: brcm80211: remove typedefs.h Staging: brcm80211: remove uintptr typedef usage Staging: hv: remove struct vmbus_channel_interface Staging: hv: remove Open from struct vmbus_channel_interface Staging: hv: storvsc: call vmbus_open directly Staging: hv: netvsc: call vmbus_open directly Staging: hv: channel: export vmbus_open to modules Staging: hv: remove Close from struct vmbus_channel_interface Staging: hv: netvsc: call vmbus_close directly Staging: hv: storvsc: call vmbus_close directly Staging: hv: channel: export vmbus_close to modules Staging: hv: remove SendPacket from struct vmbus_channel_interface Staging: hv: storvsc: call vmbus_sendpacket directly ... Fix up conflicts in drivers/staging/cx25821/cx25821-audio-upstream.c drivers/staging/cx25821/cx25821-audio.h due to warring whitespace cleanups (neither of which were all that great)
2010-10-28Merge 'staging-next' to Linus's treeGreg Kroah-Hartman
This merges the staging-next tree to Linus's tree and resolves some conflicts that were present due to changes in other trees that were affected by files here. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: free space correcly for files unlinked while open hfsplus: fix double lock typo in ioctl
2010-10-28ext4: fix compile with CONFIG_EXT4_FS_XATTR disabledIngo Molnar
Commit 5dabfc78dced ("ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()") causes fs/ext4/super.c:4776: error: implicit declaration of function ‘ext4_init_xattr’ when CONFIG_EXT4_FS_XATTR is disabled. It renamed init_ext4_xattr to ext4_init_xattr but forgot to update the dummy definition in fs/ext4/xattr.h. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-289p: Add datasync to client side TFSYNC/RFSYNC for dotlVenkateswararao Jujjuri (JV)
SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Use generic_file_open with lookup_instantiate_filpAneesh Kumar K.V
We need to do O_LARGEFILE check even in case of 9p. Use the generic_file_open helper Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Add missing iput in v9fs_vfs_lookupAneesh Kumar K.V
Make sure we drop inode reference in the error path Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Use mknod 9p operation on create without open requestAneesh Kumar K.V
A create without LOOKUP_OPEN flag set is due to mknod of regular files. Use mknod 9P operation for the same Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Implement TREADLINK operation for 9p2000.LM. Mohan Kumar
Synopsis size[4] TReadlink tag[2] fid[4] size[4] RReadlink tag[2] target[s] Description Readlink is used to return the contents of the symoblic link referred by fid. Contents of symboic link is returned as a response. target[s] - Contents of the symbolic link referred by fid. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Use V9FS_MAGIC in statfsM. Mohan Kumar
Use V9FS_MAGIC as the file system type while filling kernel statfs strucutre instead of using host file system magic number. Also move the definition of V9FS_MAGIC from v9fs.h to standard magic.h file. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Implement TGETLOCKM. Mohan Kumar
Synopsis size[4] TGetlock tag[2] fid[4] getlock[n] size[4] RGetlock tag[2] getlock[n] Description TGetlock is used to test for the existence of byte range posix locks on a file identified by given fid. The reply contains getlock structure. If the lock could be placed it returns F_UNLCK in type field of getlock structure. Otherwise it returns the details of the conflicting locks in the getlock structure getlock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK start[8] - Starting offset for lock length[8] - Number of bytes to check for the lock If length is 0, check for lock in all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock/owns the task in case of reply client[4] - Client id of the system that owns the process which has the conflicting lock Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Implement TLOCKM. Mohan Kumar
Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28[9p] Introduce client side TFSYNC/RFSYNC for dotl.Venkateswararao Jujjuri (JV)
SYNOPSIS size[4] Tfsync tag[2] fid[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28[fs/9p] Add file_operations for cached mode in dotl protocol.Venkateswararao Jujjuri (JV)
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Add access = client option to opt in acl evaluation.Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Implement create time inheritanceAneesh Kumar K.V
Inherit default ACL on create Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Update ACL on chmodAneesh Kumar K.V
We need update the acl value on chmod Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Implement setting posix aclAneesh Kumar K.V
This patch also update mode bits, as a normal file system. I am not sure wether we should do that, considering that a setxattr on the server will again update the ACL/mode value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Add xattr callbacks for POSIX ACLAneesh Kumar K.V
This patch implement fetching POSIX ACL from the server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Implement POSIX ACL permission checking functionAneesh Kumar K.V
The ACL value is fetched as a part of inode initialization from the server and the permission checking function use the cached value of the ACL Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: Remove the redundant rsize calculation in v9fs_file_write()jvrao
the same calculation is done in p9_client_write Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Add a Direct IO support for non-cached operations.jvrao
The presence of v9fs_direct_IO() in the address space ops vector allowes open() O_DIRECT flags which would have failed otherwise. In the non-cached mode, we shunt off direct read and write requests before the VFS gets them, so this method should never be called. Direct IO is not 'yet' supported in the cached mode. Hence when this routine is called through generic_file_aio_read(), the read/write fails with an error. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: mkdir fix for setting S_ISGID bit as per parent directoryHarsh Prateek Bora
The current implementation of 9p client mkdir function does not set the S_ISGID mode bit for the directory being created if the parent directory has this bit set. This patch fixes this problem so that the newly created directory inherits the gid from parent directory and not from the process creating this directory, when the S_ISGID bit is set in parent directory. Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: Pass the correct end of buffer to p9dirent_readSripathi Kodi
A patch was accepted recently for sending correct buffer size to p9stat_read. We need a similar patch in v9fs_dir_readdir_dotl to send correct end of buffer to p9dirent_read. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28fs/9p: setrlimit fix for 9p writeHarsh Prateek Bora
Current 9p client file write code does not check for RLIMIT_FSIZE resource. This bug was found by running LTP test case for setrlimit. This bug is fixed by calling generic_write_checks before sending the write request to the server. Without this patch: the write function is allowed to write above the RLIMIT_FSIZE set by user. With this patch: the write function checks for RLIMIT_SIZE and writes upto the size limit. Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-289p: remove unneeded checksDan Carpenter
git_t is unsigned an can never be less than zero. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-27Merge branch 'upstream-merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) ext4,jbd2: convert tracepoints to use major/minor numbers ext4: optimize orphan_list handling for ext4_setattr ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new ext4: fix compile error in ext4_fallocate() ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end() ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static ext4: rename {ext,idx}_pblock and inline small extent functions ext4: make various ext4 functions be static ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*() ext4: fix kernel oops if the journal superblock has a non-zero j_errno ext4: update writeback_index based on last page scanned ext4: implement writeback livelock avoidance using page tagging ext4: tidy up a void argument in inode.c ext4: add batched_discard into ext4 feature list ext4: Add batched discard support for ext4 fs: Add FITRIM ioctl ext4: Use return value from sb_issue_discard() ext4: Check return value of sb_getblk() and friends ext4: use bio layer instead of buffer layer in mpage_da_submit_io ...
2010-10-27Merge branch 'next' into upstream-mergeTheodore Ts'o
Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h
2010-10-27Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits) quota: Fix possible oops in __dquot_initialize() ext3: Update kernel-doc comments jbd/2: fixed typos ext2: fixed typo. ext3: Fix debug messages in ext3_group_extend() jbd: Convert atomic_inc() to get_bh() ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate() jbd: Fix debug message in do_get_write_access() jbd: Check return value of __getblk() ext3: Use DIV_ROUND_UP() on group desc block counting ext3: Return proper error code on ext3_fill_super() ext3: Remove unnecessary casts on bh->b_data ext3: Cleanup ext3_setup_super() quota: Fix issuing of warnings from dquot_transfer quota: fix dquot_disable vs dquot_transfer race v2 jbd: Convert bitops to buffer fns ext3/jbd: Avoid WARN() messages when failing to write the superblock jbd: Use offset_in_page() instead of manual calculation jbd: Remove unnecessary goto statement jbd: Use printk_ratelimited() in journal_alloc_journal_head() ...
2010-10-27ext4: optimize orphan_list handling for ext4_setattrDmitry Monakhov
Surprisingly chown() on ext4 is not SMP scalable operation. Due to unconditional orphan_del(NULL, inode) in ext4_setattr() result in significant performance overhead because of global orphan mutex, especially in no-journal mode (where orphan_add() is noop). It is possible to skip explicit orphan_del if possible. Results of fchown() micro-benchmark in no-journal mode while (1) { iteration++; fchown(fd, uid, gid); fchown(fd, uid + 1, gid + 1) } measured: iterations per millisecond | nr_tasks | w/o patch | with patch | | 1 | 142 | 185 | | 4 | 109 | 642 | Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: fix unbalanced mutex unlock in error path of ext4_li_request_newNicolas Kaiser
Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27Merge branch 'akpm-incoming-2'Linus Torvalds
* akpm-incoming-2: (139 commits) epoll: make epoll_wait() use the hrtimer range feature select: rename estimate_accuracy() to select_estimate_accuracy() Remove duplicate includes from many files ramoops: use the platform data structure instead of module params kernel/resource.c: handle reinsertion of an already-inserted resource kfifo: fix kfifo_alloc() to return a signed int value w1: don't allow arbitrary users to remove w1 devices alpha: remove dma64_addr_t usage mips: remove dma64_addr_t usage sparc: remove dma64_addr_t usage fuse: use release_pages() taskstats: use real microsecond granularity for CPU times taskstats: split fill_pid function taskstats: separate taskstats commands delayacct: align to 8 byte boundary on 64-bit systems delay-accounting: reimplement -c for getdelays.c to report information on a target command namespaces Kconfig: move namespace menu location after the cgroup namespaces Kconfig: remove the cgroup device whitelist experimental tag namespaces Kconfig: remove pointless cgroup dependency namespaces Kconfig: make namespace a submenu ...
2010-10-27ext4: fix compile error in ext4_fallocate()Kazuya Mio
When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got the following compile error. CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function 'ext4_fallocate': fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function) fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once fs/ext4/extents.c:3772: error: for each function it appears in.) make[2]: *** [fs/ext4/extents.o] Error 1 The patch fixes this problem. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them staticEric Sandeen
These functions are only used within fs/ext4/mballoc.c, so move them so they are used after they are defined, and then make them be static. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()Theodore Ts'o
Fix a namespace leak from fs/ext4 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: move flush_completed_IO to fs/ext4/fsync.c and make it staticTheodore Ts'o
Fix a namespace leak by moving the function to the file where it is used and making it static. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: rename {ext,idx}_pblock and inline small extent functionsTheodore Ts'o
Cleanup namespace leaks from fs/ext4 and the inline trivial functions ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the code size actually shrinks when we make these functions inline, they're so trivial. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: make various ext4 functions be staticTheodore Ts'o
These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()Theodore Ts'o
This is a cleanup to avoid namespace leaks out of fs/ext4 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: fix kernel oops if the journal superblock has a non-zero j_errnoTheodore Ts'o
Commit 84061e0 fixed an accounting bug only to introduce the possibility of a kernel OOPS if the journal has a non-zero j_errno field indicating that the file system had detected a fs inconsistency. After the journal replay, if the journal superblock indicates that the file system has an error, this indication is transfered to the file system and then ext4_commit_super() is called to write this to the disk. But since the percpu counters are now initialized after the journal replay, the call to ext4_commit_super() will cause a kernel oops since it needs to use the percpu counters the ext4 superblock structure. The fix is to skip setting the ext4 free block and free inode fields if the percpu counter has not been set. Thanks to Ken Sumrall for reporting and analyzing the root causes of this bug. Addresses-Google-Bug: #3054080 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: update writeback_index based on last page scannedEric Sandeen
As pointed out in a prior patch, updating the mapping's writeback_index based on pages written isn't quite right; what the writeback index is really supposed to reflect is the next page which should be scanned for writeback during periodic flush. As in write_cache_pages(), write_cache_pages_da() does this scanning for us as we assemble the mpd for later writeout. If we keep track of the next page after the current scan, we can easily update writeback_index without worrying about pages written vs. pages skipped, etc. Without this, an fsync will reset writeback_index to 0 (its starting index) + however many pages it wrote, which can mess up the progress of periodic flush. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: implement writeback livelock avoidance using page taggingEric Sandeen
This is analogous to Jan Kara's commit, f446daaea9d4a420d16c606f755f3689dcb2d0ce mm: implement writeback livelock avoidance using page tagging but since we forked write_cache_pages, we need to reimplement it there (and in ext4_da_writepages, since range_cyclic handling was moved to there) If you start a large buffered IO to a file, and then set fsync after it, you'll find that fsync does not complete until the other IO stops. If you continue re-dirtying the file (say, putting dd with conv=notrunc in a loop), when fsync finally completes (after all IO is done), it reports via tracing that it has written many more pages than the file contains; in other words it has synced and re-synced pages in the file multiple times. This then leads to problems with our writeback_index update, since it advances it by pages written, and essentially sets writeback_index off the end of the file... With the following patch, we only sync as much as was dirty at the time of the sync. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: tidy up a void argument in inode.cEric Sandeen
This doesn't fix anything at all, it just removes a vestige of prior use from __mpage_da_writepage() __mpage_da_writepage() had a *void argument leftover from its previous life as a callback; make it reflect the actual type. Fixing this up makes it slightly more obvious to read, and enables proper typechecking. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27ext4: add batched_discard into ext4 feature listLukas Czerner
Should be applied on the top of "lazy inode table initialization" and "batched discard support" patch-sets. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>