summaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/super.c
AgeCommit message (Collapse)Author
2010-08-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-09convert ocfs2 to ->evict_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-16Merge branch 'master' into for-nextJiri Kosina
2010-06-16fix typos concerning "initiali[zs]e"Uwe Kleine-König
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-24quota: rename default quotactl methods to dquot_Christoph Hellwig
Follow the dquot_* style used elsewhere in dquot.c. [Jan Kara: Fixed up missing conversion of ext2] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24quota: drop remount argument to ->quota_on and ->quota_offChristoph Hellwig
Remount handling has fully moved into the filesystem, so all this is superflous now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappersChristoph Hellwig
Instead of having wrappers in the VFS namespace export the dquot_suspend and dquot_resume helpers directly. Also rename vfs_quota_disable to dquot_disable while we're at it. [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24ocfs2: Fix use after free on remount read-onlyJan Kara
We also have to cancel quota syncing thread on remount read only because at that moment quota is being turned off. Otherwise quota syncing thread will try to access already freed quota structures. Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ocfs2: Fix lock inversion in quotas during umountJan Kara
We cannot cancel delayed work from ocfs2_local_free_info because that is called with dqonoff_mutex held and the work it cancels requires dqonoff_mutex to finish. Cancel the work before acquiring dqonoff_mutex. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-18Merge branch 'discontig-bg' of git://oss.oracle.com/git/tma/linux-2.6 into ↵Joel Becker
ocfs2-merge-window
2010-05-10ocfs2: Wrap signal blocking in void functions.Joel Becker
ocfs2 sometimes needs to block signals around dlm operations, but it currently does it with sigprocmask(). Even worse, it's checking the error code of sigprocmask(). The in-kernel sigprocmask() can only error if you get the SIG_* argument wrong. We don't. Wrap the sigprocmask() calls with ocfs2_[un]block_signals(). These functions are void, but they will BUG() if somehow sigprocmask() returns an error. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: Make nointr a default mount optionSunil Mushran
OCFS2 has never really supported intr. This patch acknowledges this reality and makes nointr the default mount option. In a later patch, we intend to support intr. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: Add dir_resv_level mount optionMark Fasheh
The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: increase the default size of local alloc windowsMark Fasheh
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: clean up localalloc mount option size parsingMark Fasheh
This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: use allocation reservations during file writeMark Fasheh
Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: allocation reservationsMark Fasheh
This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-04-13ocfs2: ocfs2_group_bitmap_size has to handle old volume.Tao Ma
ocfs2_group_bitmap_size has to handle the case when the volume don't have discontiguous block group support. So pass the feature_incompat in and check it. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Define data structures for discontiguous block groups.Joel Becker
Defines the OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG feature bit and modifies struct ocfs2_group_desc for the feature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-02-26ocfs2: add extent block stealing for ocfs2 v5Tiger Yang
This patch add extent block (metadata) stealing mechanism for extent allocation. This mechanism is same as the inode stealing. if no room in slot specific extent_alloc, we will try to allocate extent block from the next slot. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/trivial: Remove trailing whitespacesSunil Mushran
Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-24Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c ocfs2/trivial: Use proper mask for 2 places in hearbeat.c Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink. Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode? ocfs2: Use FIEMAP_EXTENT_SHARED fiemap: Add new extent flag FIEMAP_EXTENT_SHARED ocfs2: replace u8 by __u8 in ocfs2_fs.h ocfs2: explicit declare uninitialized var in user_cluster_connect() ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock() ocfs2: return -EAGAIN instead of EAGAIN in dlm ocfs2/cluster: Make fence method configurable - v2 ocfs2: Set MS_POSIXACL on remount ocfs2: Make acl use the default ocfs2: Always include ACL support
2009-10-29ocfs2: return f_fsid info in ocfs2_statfs()Coly Li
Currently the f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (vfs layer fills in 0 as default). Since in some conditions, f_fsid value might be used in a (f_fsid, ino) pair to uniquely identify a file, ocfs2 should return a unique defined f_fsid value from ocfs2_statfs(). Because uuid_str is the same on big or litlle endian machine, it's endian consistent to use osb->uuid_str to generate f_fsid value. Signed-off-by: Coly Li <coly.li@suse.de> Cc: Sunil Mushran <sunil.mushran@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Set MS_POSIXACL on remountJan Kara
We have to set MS_POSIXACL on remount as well. Otherwise VFS would not know we started supporting ACLs after remount and thus ACLs would not work. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Make acl use the defaultJan Kara
Change acl mount options handling to match the one of XFS and BTRFS and hopefully it is also easier to use now. When admin does not specify any acl mount option, acls are enabled if and only if the filesystem has xattr feature enabled. If admin specifies 'acl' mount option, we fail the mount if the filesystem does not have xattr feature and thus acls cannot be enabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Always include ACL supportJan Kara
To become consistent with filesystems such as XFS or BTRFS, make posix ACLs always available. This also reduces possibility of misconfiguration on admin's side. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Return -EINVAL when a device is not ocfs2.Joel Becker
In case of non-modular kernels the root filesystem is mounted by trying several filesystems. If ocfs2 was tried before the actual filesystem type, the mount would fail because ocfs2_sb_probe() returns -EAGAIN instead of -EINVAL. ocfs2 will now return -EINVAL properly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Reported-by: Laszlo Attila Toth <panther@balabit.hu>
2009-10-01const: constify remaining file_operationsAlexey Dobriyan
[akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23headers: utsname.h reduxAlexey Dobriyan
* remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (85 commits) ocfs2: Use buffer IO if we are appending a file. ocfs2: add spinlock protection when dealing with lockres->purge. dlmglue.c: add missed mlog lines ocfs2: __ocfs2_abort() should not enable panic for local mounts ocfs2: Add ioctl for reflink. ocfs2: Enable refcount tree support. ocfs2: Implement ocfs2_reflink. ocfs2: Add preserve to reflink. ocfs2: Create reflinked file in orphan dir. ocfs2: Use proper parameter for some inode operation. ocfs2: Make transaction extend more efficient. ocfs2: Don't merge in 1st refcount ops of reflink. ocfs2: Modify removing xattr process for refcount. ocfs2: Add reflink support for xattr. ocfs2: Create an xattr indexed block if needed. ocfs2: Call refcount tree remove process properly. ocfs2: Attach xattr clusters to refcount tree. ocfs2: Abstract ocfs2 xattr tree extend rec iteration process. ocfs2: Abstract the creation of xattr block. ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value. ...
2009-09-23ocfs2: __ocfs2_abort() should not enable panic for local mountsSunil Mushran
In a clustered setup, we have to panic the box on journal abort. This is because we don't have the facility to go hard readonly. With hard ro, another node would detect node failure and initiate recovery. Having said that, we shouldn't force panic if the volume is mounted locally. This patch defers the handling to the mount option, errors. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-22ocfs2: Add refcount tree lock mechanism.Tao Ma
Implement locking around struct ocfs2_refcount_tree. This protects all read/write operations on refcount trees. ocfs2_refcount_tree has its own lock and its own caching_info, protecting buffers among multiple nodes. User must call ocfs2_lock_refcount_tree before his operation on the tree and unlock it after that. ocfs2_refcount_trees are referenced by the block number of the refcount tree root block, So we create an rb-tree on the ocfs2_super to look them up. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22const: make struct super_block::s_qcop constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-04ocfs2: move ip_created_trans to struct ocfs2_caching_infoJoel Becker
Similar ip_last_trans, ip_created_trans tracks the creation of a journal managed inode. This specifically tracks what transaction created the inode. This is so the code can know if the inode has ever been written to disk. This behavior is desirable for any journal managed object. We move it to struct ocfs2_caching_info as ci_created_trans so that any object using ocfs2_caching_info can rely on this behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: move ip_last_trans to struct ocfs2_caching_infoJoel Becker
We have the read side of metadata caching isolated to struct ocfs2_caching_info, now we need the write side. This means the journal functions. The journal only does a couple of things with struct inode. This change moves the ip_last_trans field onto struct ocfs2_caching_info as ci_last_trans. This field tells the journal whether a pending journal flush is required. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: Take the inode out of the metadata read/write paths.Joel Becker
We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: Change metadata caching locks to an operations structure.Joel Becker
We don't really want to cart around too many new fields on the ocfs2_caching_info structure. So let's wrap all our access of the parent object in a set of operations. One pointer on caching_info, and more flexibility to boot. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: Make the ocfs2_caching_info structure self-contained.Joel Becker
We want to use the ocfs2_caching_info structure in places that are not inodes. To do that, it can no longer rely on referencing the inode directly. This patch moves the flags to ocfs2_caching_info->ci_flags, stores pointers to the parent's locks on the ocfs2_caching_info, and renames the constants and flags to reflect its independant state. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17ocfs2: Don't oops in ocfs2_kill_sb on a failed mountJan Kara
If we fail to mount the filesystem, we have to be careful not to dereference uninitialized structures in ocfs2_kill_sb. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Fix initialization of blockcheck statsJan Kara
We just set blockcheck stats to zeros but we should also properly initialize the spinlock there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21ocfs2: Fix deadlock on umountJan Kara
In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-08ocfs2: Fixup orphan scan cleanup after failed mountJeff Mahoney
If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-23Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. ocfs2: Add lockdep annotations vfs: Set special lockdep map for dirs only if not set by fs ocfs2: Disable orphan scanning for local and hard-ro mounts ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ocfs2: Stop orphan scan as early as possible during umount ocfs2: Fix ocfs2_osb_dump() ocfs2: Pin journal head before accessing jh->b_committed_data ocfs2: Update atime in splice read if necessary. ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.
2009-06-22ocfs2: Disable orphan scanning for local and hard-ro mountsSunil Mushran
Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Stop orphan scan as early as possible during umountSunil Mushran
Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Fix ocfs2_osb_dump()Sunil Mushran
Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-19block: rename CONFIG_LBD to CONFIG_LBDAFBartlomiej Zolnierkiewicz
Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-16Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/net: Use wait_event() in o2net_send_message_vec() ocfs2: Adjust rightmost path in ocfs2_add_branch. ocfs2: fdatasync should skip unimportant metadata writeout ocfs2: Remove redundant gotos in ocfs2_mount_volume() ocfs2: Add statistics for the checksum and ecc operations. ocfs2 patch to track delayed orphan scan timer statistics ocfs2: timer to queue scan of all orphan slots ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories ocfs2: Fix possible deadlock in quota recovery ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() ocfs2: Fix lock inversion in ocfs2_local_read_info() ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() ocfs2: update comments in masklog.h ocfs2: Don't printk the error when listing too many xattrs.
2009-06-11Push BKL down into ->remount_fs()Alessio Igor Bogani
[xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11push BKL down into ->put_superChristoph Hellwig
Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>