summaryrefslogtreecommitdiffstats
path: root/include/linux/ext3_fs.h
AgeCommit message (Collapse)Author
2011-01-10ext3: speed up file creates by optimizing rec_len functionsEric Sandeen
The addition of 64k block capability in the rec_len_from_disk and rec_len_to_disk functions added a bit of math overhead which slows down file create workloads needlessly when the architecture cannot even support 64k blocks, thanks to page size limits. Similar changes already exist in the ext4 codebase. The directory entry checking can also be optimized a bit by sprinkling in some unlikely() conditions to move the error handling out of line. bonnie++ sequential file creates on a 512MB ramdisk speeds up from about 77,000/s to about 82,000/s, about a 6% improvement. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-01-10ext3: Add batched discard support for ext3Lukas Czerner
Walk through allocation groups and trim all free extents. It can be invoked through FITRIM ioctl on the file system. The main idea is to provide a way to trim the whole file system if needed, since some SSD's may suffer from performance loss after the whole device was filled (it does not mean that fs is full!). It search for free extents in allocation groups specified by Byte range start -> start+len. When the free extent is within this range, blocks are marked as used and then trimmed. Afterwards these blocks are marked as free in per-group bitmap. [JK: Fixed up error handling and trimming of a single group] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2010-08-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-09convert ext3 to ->evict_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-07-21ext3: remove vestiges of nobh supportChristoph Hellwig
The nobh option was only supported for writeback mode, but given that all write paths (except mmapped writed) actually create buffer heads, it effectively was a no-op already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27drop unused dentry argument to ->fsyncChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-29ext3: fix broken handling of EXT3_STATE_NEWLinus Torvalds
In commit 9df93939b735 ("ext3: Use bitops to read/modify EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to use bitops for its state handling. However, unline the same ext4 change, it didn't actually change the name of the field when it changed the semantics of it. As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that initialized the field to EXT3_STATE_NEW. And that does not work _at_all_ when we're now working with individually named bits rather than values that get masked. So the code tried to mark the state to be new, but in actual fact set the field to EXT3_STATE_JDATA. Which makes no sense at all, and screws up all the code that checks whether the inode was newly allocated. In particular, it made the xattr code unhappy, and caused various random behavior, like apparently https://bugzilla.redhat.com/show_bug.cgi?id=577911 So fix the initialization, and rename the field to match ext4 so that we don't have this happen again. Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Daniel J Walsh <dwalsh@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-05Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05ext3: Use bitops to read/modify EXT3_I(inode)->i_stateJan Kara
At several places we modify EXT3_I(inode)->i_state without holding i_mutex (ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Unify log messages in ext3Alexey Fisher
Make messages produced by ext3 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684964] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended dmesg after patch: [ 873.300792] EXT3-fs (loop0): using internal journaln [ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. [ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192 [ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2? [ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-15ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle()Jan Kara
Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sence. Currently it was set only by ext3_getblk(). Since the parameter has some effect only if create == 1, it is easy to check that the three callers which end up calling ext3_getblk() with create == 1 (ext3_append, ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz>
2009-04-03Merge branch 'ext3-latency-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext3: Add replace-on-rename hueristics for data=writeback mode ext3: Add replace-on-truncate hueristics for data=writeback mode ext3: Use WRITE_SYNC for commits which are caused by fsync() block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks
2009-04-03ext3: Add replace-on-truncate hueristics for data=writeback modeTheodore Ts'o
In data=writeback mode, start an asynchronous flush when closing a file which had been previously truncated down to zero. This lowers the probability of data loss in the case of applications that attempt to replace a file using truncate. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-04-02ext3: remove the BKL in ext3/ioctl.cCyrus Massoumi
Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove the BKL around ext3_ioctl(). Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
2009-01-08ext3: tighten restrictions on inode flagsDuane Griffin
At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08ext3: don't inherit inappropriate inode flags from parentDuane Griffin
At present INDEX is the only flag that new ext3 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC and TOPDIR from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-28ext3: Add support for non-native signed/unsigned htree hash algorithmsTheodore Ts'o
The original ext3 hash algorithms assumed that variables of type char were signed, as God and K&R intended. Unfortunately, this assumption is not true on some architectures. Userspace support for marking filesystems with non-native signed/unsigned chars was added two years ago, but the kernel-side support was never added (until now). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org
2008-10-20ext3: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16include: replace __FUNCTION__ with __func__Harvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-03generic block based fiemap implementationJosef Bacik
Any block based fs (this patch includes ext3) just has to declare its own fiemap() function and then call this generic function with its own get_block_t. This works well for block based filesystems that will map multiple contiguous blocks at one time, but will work for filesystems that only map one block at a time, you will just end up with an "extent" for each block. One gotcha is this will not play nicely where there is hole+data after the EOF. This function will assume its hit the end of the data as soon as it hits a hole after the EOF, so if there is any data past that it will not pick that up. AFAIK no block based fs does this anyway, but its in the comments of the function anyway just in case. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
2008-07-25ext3: handle corrupted orphan list at mountDuane Griffin
If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext3_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext3_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: printk warning fix] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07iget: stop EXT3 from using iget() and read_inode()David Howells
Stop the EXT3 filesystem from using iget() and read_inode(). Replace ext3_read_inode() with ext3_iget(), and call that instead of iget(). ext3_iget() then uses iget_locked() directly and returns a proper error code instead of an inode in the event of an error. ext3_fill_super() returns any error incurred when getting the root inode instead of EINVAL. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14Fix 64KB blocksize in ext3 directoriesJan Kara
With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext3_next_entry() when we are changing them anyway. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18ext3: support large blocksize up to PAGESIZETakashi Sato
This patch set supports large block size(>4k, <=64k) in ext3 just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext3 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext3: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext3: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext3, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext3: remove #ifdef CONFIG_EXT3_INDEXEric Sandeen
CONFIG_EXT3_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext3_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ext3: copy i_flags to inode flags on writeJan Kara
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same thing is done on GETFLAGS ioctl. Quota code changes these flags on quota files (to make it harder for sysadmin to screw himself) and these changes were not correctly propagated into the filesystem (especially, lsattr did not show them and users were wondering...). Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext3-specific i_flags. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] mark struct inode_operations const 3Arjan van de Ven
Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-09-30[PATCH] BLOCK: Move the Ext3 device ioctl compat stuff to the Ext3 driver ↵David Howells
[try #6] Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3 driver so that the Ext3 header file doesn't need to be included. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30[PATCH] BLOCK: Move common FS-specific ioctls to linux/fs.h [try #6]David Howells
Move common FS-specific ioctls from linux/ext2_fs.h to linux/fs.h as FS_IOC_* and FS_IOC32_* and have the users of them use those as a base. Also move the GETFLAGS/SETFLAGS flags to linux/fs.h as FS_*_FL macros, and then have the other users use them as a base. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-27[PATCH] ext3: Fix sparse warningsDave Kleikamp
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: More whitespace cleanupsDave Kleikamp
More white space cleanups in preparation of cloning ext4 from ext3. Removing spaces that precede a tab. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-24Move several *_SUPER_MAGIC symbols to include/linux/magic.h.Jeff Garzik
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-07-31[PATCH] ext3: avoid triggering ext3_error on bad NFS file handleNeil Brown
The inode number out of an NFS file handle gets passed eventually to ext3_get_inode_block() without any checking. If ext3_get_inode_block() allows it to trigger an error, then bad filehandles can have unpleasant effect - ext3_error() will usually cause a forced read-only remount, or a panic if `errors=panic' was used. So remove the call to ext3_error there and put a matching check in ext3/namei.c where inode numbers are read off storage. [akpm@osdl.org: fix off-by-one error] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: <stable@kernel.org> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversionMingming Cao
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t, and replace the printk format string respondingly. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixesMingming Cao
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes int type, thus limited ext3 filesystem to 8TB (4kblock size based). While trying to fix them, it seems quite confusing in the ext3 code where some blocks are filesystem-wide blocks, some are group relative offsets that need to be signed value (as -1 has special meaning). So it seem saner to define two types of physical blocks: one is filesystem wide blocks, another is group-relative blocks. The following patches clarify these two types of blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3 filesystem limit to 8TB. With this series of patches and the percpu counter data type changes in the mm tree, we are able to extend exts filesystem limit to 16TB. This work is also a pre-request for the recent >32 bit ext3 work, and makes the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine ext3_fsblk_t from unsigned long to sector_t and redefine the format string for ext3 filesystem block corresponding. Two RFC with a series patches have been posted to ext2-devel list and have been reviewed and discussed: http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2 http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2 Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and >8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests includes overnight fsx, tiobench, dbench and fsstress. This patch: Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for filesystem wide blocks. This patch classifies all block group relative blocks, and ext3_fsblk_t blocks occurs in the same function where used to be confusing before. Also include kernel bug fixes for filesystem wide in-kernel block variables. There are some fileystem wide blocks are treated as int/unsigned int type in the kernel currently, especially in ext3 block allocation and reservation code. This patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned long) type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-04Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>David Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-25Don't include private headers from user-visible part of linux/ext3_fs.hDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-03-28[PATCH] Make most file operations structs in fs/ constArjan van de Ven
This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ext3_get_blocks: support multiple blocks allocation in ext3_new_block()Mingming Cao
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to allocate the requested number of blocks on a best effort basis: After allocated the first block, it will always attempt to allocate the next few(up to the requested size and not beyond the reservation window) adjacent blocks at the same time. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ext3_get_blocks: Mapping multiple blocks at a onceMingming Cao
Currently ext3_get_block() only maps or allocates one block at a time. This is quite inefficient for sequential IO workload. I have posted a early implements a simply multiple block map and allocation with current ext3. The basic idea is allocating the 1st block in the existing way, and attempting to allocate the next adjacent blocks on a best effort basis. More description about the implementation could be found here: http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2 The following the latest version of the patch: break the original patch into 5 patches, re-worked some logicals, and fixed some bugs. The break ups are: [patch 1] Adding map multiple blocks at a time in ext3_get_blocks() [patch 2] Extend ext3_get_blocks() to support multiple block allocation [patch 3] Implement multiple block allocation in ext3-try-to-allocate (called via ext3_new_block()). [patch 4] Proper accounting updates in ext3_new_blocks() [patch 5] Adjust reservation window size properly (by the given number of blocks to allocate) before block allocation to increase the possibility of allocating multiple blocks in a single call. Tests done so far includes fsx,tiobench and dbench. The following numbers collected from Direct IO tests (1G file creation/read) shows the system time have been greatly reduced (more than 50% on my 8 cpu system) with the patches. 1G file DIO write: 2.6.15 2.6.15+patches real 0m31.275s 0m31.161s user 0m0.000s 0m0.000s sys 0m3.384s 0m0.564s 1G file DIO read: 2.6.15 2.6.15+patches real 0m30.733s 0m30.624s user 0m0.000s 0m0.004s sys 0m0.748s 0m0.380s Some previous test we did on buffered IO with using multiple blocks allocation and delayed allocation shows noticeable improvement on throughput and system time. This patch: Add support of mapping multiple blocks in one call. This is useful for DIO reads and re-writes (where blocks are already allocated), also is in line with Christoph's proposal of using getblocks() in mpage_readpage() or mpage_readpages(). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] ext3_readdir: use generic readaheadAndrew Morton
Linus points out that ext3_readdir's readahead only cuts in when ext3_readdir() is operating at the very start of the directory. So for large directories we end up performing no readahead at all and we suck. So take it all out and use the core VM's page_cache_readahead(). This means that ext3 directory reads will use all of readahead's dynamic sizing goop. Note that we're using the directory's filp->f_ra to hold the readahead state, but readahead is actually being performed against the underlying blockdev's address_space. Fortunately the readahead code is all set up to handle this. Tested with printk. It works. I was struggling to find a real workload which actually cared. (The patch also exports page_cache_readahead() to GPL modules) Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mountsMark Bellon
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). Jan Kara also noted the difficulty in moving these changes above the FS codes responding similarly to myself to Andrew's comment about possible VFS migration. Issue summary: - FS codes have to process the entire string of options anyway. - Only FS codes that use quotas must have a show_options function (for quotas to work properly) however quotas are only used in a small number of FS. - Since most of the quota using FS support other options these FS codes should have the a show_options function to show those options - and the quota echoing becomes virtually negligible. Based on feedback I have modified my patches from the original: JFS a missing patch has been restored to the posting EXT[2|3] and JFS always use the show_options function - Each FS has at least one FS specific option displayed - QUOTA output is under a CONFIG_QUOTA ifdef - a follow-on patch will add a multitude of options for each FS EXT[2|3] and JFS "quota" is treated as "usrquota" EXT3 journalled data check for journalled quota removed EXT[2|3] mount when quota specified but not compiled in - no changes from my original patch. I tested the patch and the codes warn but - still mount. With all due respection I believe the comments otherwise were a - misread of the patch. Please reread/test and comment. XFS patch removed - the XFS team already made the necessary changes EXT3 mixing old and new quotas are handled differently (not purely exclusive) - if old and new quotas for the same type are used together the old type is silently depricated for compatability (e.g. usrquota and usrjquota) - mixing of old and new quotas is an error (e.g. usrjquota and grpquota) Signed-off-by: Mark Bellon <mbellon@mvista.com> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12[PATCH] ext3: fix options parsingJan Kara
Fix a problem with ext3 mount option parsing. When remount of a filesystem fails, old options are now restored. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24[PATCH] quota: ext3: Improve quota credit estimatesJan Kara
Use improved credits estimates for quota operations. Also reserve a space for a quota operation in a transaction only if filesystem was mounted with some quota options. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!