summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_iomap.c
AgeCommit message (Collapse)Author
2012-03-05xfs: log file size updates as part of unwritten extent conversionChristoph Hellwig
If we convert and unwritten extent past the current i_size log the size update as part of the extent manipulation transactions instead of doing an unlogged metadata update later. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: remove the i_size field in struct xfs_inodeChristoph Hellwig
There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-13xfs: cleanup xfs_iomap_eof_align_last_fsbChristoph Hellwig
Replace the nasty if, else if, elseif condition with more natural C flow that expressed the logic we want here better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2011-10-11xfs: simplify xfs_trans_ijoin* againChristoph Hellwig
There is no reason to keep a reference to the inode even if we unlock it during transaction commit because we never drop a reference between the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref back into xfs_trans_ijoin - the third argument decides if an unlock is needed now. I'm actually starting to wonder if allowing inodes to be unlocked at transaction commit really is worth the effort. The only real benefit is that they can be unlocked earlier when commiting a synchronous transactions, but that could be solved by doing the log force manually after the unlock, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11xfs: rename xfs_bmapi to xfs_bmapi_writeDave Chinner
Now that all the read-only users of xfs_bmapi have been converted to use xfs_bmapi_read(), we can remove all the read-only handling cases from xfs_bmapi(). Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect the fact it is for allocation only. This enables us to kill the XFS_BMAPI_WRITE flag as well. Also clean up xfs_bmapi_write to the style used in the newly added xfs_bmapi_read/delay functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11xfs: introduce xfs_bmapi_delay()Christoph Hellwig
Delalloc reservations are much simpler than allocations, so give them a separate bmapi-level interface. Using the previously added xfs_bmapi_reserve_delalloc we get a function that is only minimally more complicated than xfs_bmapi_read, which is far from the complexity in xfs_bmapi. Also remove the XFS_BMAPI_DELAY code after switching over the only user to xfs_bmapi_delay. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11xfs: introduce xfs_bmapi_read()Dave Chinner
xfs_bmapi() currently handles both extent map reading and allocation. As a result, the code is littered with "if (wr)" branches to conditionally do allocation operations if required. This makes the code much harder to follow and causes significant indent issues with the code. Given that read mapping is much simpler than allocation, we can split out read mapping from xfs_bmapi() and reuse the logic that we have already factored out do do all the hard work of handling the extent map manipulations. The results in a much simpler function for the common extent read operations, and will allow the allocation code to be simplified in another commit. Once xfs_bmapi_read() is implemented, convert all the callers of xfs_bmapi() that are only reading extents to use the new function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-03-07xfs: rename xfs_cmn_err_fsblock_zero()Dave Chinner
The "cmn_err" part of the function name is no longer relevant. Rename the function to xfs_alert_fsblock_zero() to match the new logging API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07xfs: convert xfs_cmn_err to xfs_alert_tagDave Chinner
Continue the conversion of the old cmn_err interface be converting all the conditional panic tag errors to xfs_alert_tag() and then removing xfs_cmn_err(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-28xfs: speculative delayed allocation uses rounddown_power_of_2 badlyDave Chinner
rounddown_power_of_2() returns an undefined result when passed a value of zero. The specualtive delayed allocation code is doing this when the inode is zero length. Hence occasionally the preallocation is much, much larger than is necessary (e.g. 8GB for a 270 _byte_ file). Ensure we don't even pass a zero value to this function so the result of preallocation is always the desired size. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-01-04xfs: dynamic speculative EOF preallocationDave Chinner
Currently the size of the speculative preallocation during delayed allocation is fixed by either the allocsize mount option of a default size. We are seeing a lot of cases where we need to recommend using the allocsize mount option to prevent fragmentation when buffered writes land in the same AG. Rather than using a fixed preallocation size by default (up to 64k), make it dynamic by basing it on the current inode size. That way the EOF preallocation will increase as the file size increases. Hence for streaming writes we are much more likely to get large preallocations exactly when we need it to reduce fragementation. For default settings, the size of the initial extents is determined by the number of parallel writers and the amount of memory in the machine. For 4GB RAM and 4 concurrent 32GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576 1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576 2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152 3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304 4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608 5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208 6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088 7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088 and for 16 concurrent 16GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144 1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144 2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288 3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576 4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152 5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304 6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608 7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208 Because it is hard to take back specualtive preallocation, cases where there are large slow growing log files on a nearly full filesystem may cause premature ENOSPC. Hence as the filesystem nears full, the maximum dynamic prealloc size іs reduced according to this table (based on 4k block size): freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) This should reduce the amount of space held in speculative preallocation for such cases. The allocsize mount option turns off the dynamic behaviour and fixes the prealloc size to whatever the mount option specifies. i.e. the behaviour is unchanged. Signed-off-by: Dave Chinner <dchinner@redhat.com>
2010-12-16xfs: kill xfs_iomapChristoph Hellwig
Opencode the xfs_iomap code in it's two callers. The overlap of passed flags already was minimal and will be further reduced in the next patch. As a side effect the BMAPI_* flags for xfs_bmapi and the IO_* flags for I/O end processing are merged into a single set of flags, which should be a bit more descriptive of the operation we perform. Also improve the tracing by giving each caller it's own type set of tracepoints. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-12-16xfs: cleanup the xfs_iomap_write_* helpersChristoph Hellwig
Remove passing the BMAPI_* flags to these helpers, in xfs_iomap_write_direct the check BMAPI_DIRECT was always true, and in the xfs_iomap_write_delay path is was never checked at all. Remove the nmap return value as we never make use of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: small cleanups for xfs_iomap / __xfs_get_blocksChristoph Hellwig
Remove the flags argument to __xfs_get_blocks as we can easily derive it from the direct argument, and remove the unused BMAPI_MMAP flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: reduce stack usage in xfs_iomapChristoph Hellwig
xfs_iomap passes a xfs_bmbt_irec pointer to xfs_iomap_write_direct and xfs_iomap_write_allocate to give them the results of our read-only xfs_bmapi query. Instead of allocating a new xfs_bmbt_irec on stack for the next call to xfs_bmapi re use the one we got passed as it's not used after this point. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove unused delta tracking code in xfs_bmapiChristoph Hellwig
This code was introduced four years ago in commit 3e57ecf640428c01ba1ed8c8fc538447ada1715b without any review and has been unused since. Remove it just as the rest of the code introduced in that commit to reduce that stack usage and complexity in this central piece of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: simplify inode to transaction joiningChristoph Hellwig
Currently we need to either call IHOLD or xfs_trans_ihold on an inode when joining it to a transaction via xfs_trans_ijoin. This patches instead makes xfs_trans_ijoin usable on it's own by doing an implicity xfs_trans_ihold, which also allows us to drop the third argument. For the case where we want to hold a reference on the inode a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks the inode for needing an xfs_iput. In addition to the cleaner interface to the caller this also simplifies the implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove unneeded #include statementsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: drop dmapi hooksChristoph Hellwig
Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-05-19xfs: mark xfs_iomap_write_ helpers staticChristoph Hellwig
And also drop a useless argument to xfs_iomap_write_direct. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: kill struct xfs_iomapChristoph Hellwig
Now that struct xfs_iomap contains exactly the same units as struct xfs_bmbt_irec we can just use the latter directly in the aops code. Replace the missing IOMAP_NEW flag with a new boolean output parameter to xfs_iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: report iomap_bn in block baseChristoph Hellwig
Report the iomap_bn field of struct xfs_iomap in terms of filesystem blocks instead of in terms of bytes. Shift the byte conversions into the caller, and replace the IOMAP_DELAY and IOMAP_HOLE flag checks with checks for HOLESTARTBLOCK and DELAYSTARTBLOCK. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: report iomap_offset and iomap_bsize in block baseChristoph Hellwig
Report the iomap_offset and iomap_bsize fields of struct xfs_iomap in terms of fsblocks instead of in terms of disk blocks. Shift the byte conversions into the callers temporarily, but they will disappear or get cleaned up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: remove iomap_deltaChristoph Hellwig
The iomap_delta field in struct xfs_iomap just contains the difference between the offset passed to xfs_iomap and the iomap_offset. Just calculate it in the only caller that cares. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: remove iomap_targetChristoph Hellwig
Instead of using the iomap_target field in struct xfs_iomap and the IOMAP_REALTIME flag just use the already existing xfs_find_bdev_for_inode helper. There's some fallout as we need to pass the inode in a few more places, which we also use to sanitize some calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: limit xfs_imap_to_bmap to a single mappingChristoph Hellwig
We only call xfs_iomap for single mappings anyway, so remove all code dealing with multiple mappings from xfs_imap_to_bmap and add asserts that we never get results that we do not expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-14xfs: event tracing supportChristoph Hellwig
Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile | 8 fs/xfs/linux-2.6/xfs_acl.c | 1 fs/xfs/linux-2.6/xfs_aops.c | 52 - fs/xfs/linux-2.6/xfs_aops.h | 2 fs/xfs/linux-2.6/xfs_buf.c | 117 +-- fs/xfs/linux-2.6/xfs_buf.h | 33 fs/xfs/linux-2.6/xfs_fs_subr.c | 3 fs/xfs/linux-2.6/xfs_ioctl.c | 1 fs/xfs/linux-2.6/xfs_ioctl32.c | 1 fs/xfs/linux-2.6/xfs_iops.c | 1 fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_lrw.c | 87 -- fs/xfs/linux-2.6/xfs_lrw.h | 45 - fs/xfs/linux-2.6/xfs_super.c | 104 --- fs/xfs/linux-2.6/xfs_super.h | 7 fs/xfs/linux-2.6/xfs_sync.c | 1 fs/xfs/linux-2.6/xfs_trace.c | 75 ++ fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h | 4 fs/xfs/quota/xfs_dquot.c | 110 --- fs/xfs/quota/xfs_dquot.h | 21 fs/xfs/quota/xfs_qm.c | 40 - fs/xfs/quota/xfs_qm_syscalls.c | 4 fs/xfs/support/ktrace.c | 323 --------- fs/xfs/support/ktrace.h | 85 -- fs/xfs/xfs.h | 16 fs/xfs/xfs_ag.h | 14 fs/xfs/xfs_alloc.c | 230 +----- fs/xfs/xfs_alloc.h | 27 fs/xfs/xfs_alloc_btree.c | 1 fs/xfs/xfs_attr.c | 107 --- fs/xfs/xfs_attr.h | 10 fs/xfs/xfs_attr_leaf.c | 14 fs/xfs/xfs_attr_sf.h | 40 - fs/xfs/xfs_bmap.c | 507 +++------------ fs/xfs/xfs_bmap.h | 49 - fs/xfs/xfs_bmap_btree.c | 6 fs/xfs/xfs_btree.c | 5 fs/xfs/xfs_btree_trace.h | 17 fs/xfs/xfs_buf_item.c | 87 -- fs/xfs/xfs_buf_item.h | 20 fs/xfs/xfs_da_btree.c | 3 fs/xfs/xfs_da_btree.h | 7 fs/xfs/xfs_dfrag.c | 2 fs/xfs/xfs_dir2.c | 8 fs/xfs/xfs_dir2_block.c | 20 fs/xfs/xfs_dir2_leaf.c | 21 fs/xfs/xfs_dir2_node.c | 27 fs/xfs/xfs_dir2_sf.c | 26 fs/xfs/xfs_dir2_trace.c | 216 ------ fs/xfs/xfs_dir2_trace.h | 72 -- fs/xfs/xfs_filestream.c | 8 fs/xfs/xfs_fsops.c | 2 fs/xfs/xfs_iget.c | 111 --- fs/xfs/xfs_inode.c | 67 -- fs/xfs/xfs_inode.h | 76 -- fs/xfs/xfs_inode_item.c | 5 fs/xfs/xfs_iomap.c | 85 -- fs/xfs/xfs_iomap.h | 8 fs/xfs/xfs_log.c | 181 +---- fs/xfs/xfs_log_priv.h | 20 fs/xfs/xfs_log_recover.c | 1 fs/xfs/xfs_mount.c | 2 fs/xfs/xfs_quota.h | 8 fs/xfs/xfs_rename.c | 1 fs/xfs/xfs_rtalloc.c | 1 fs/xfs/xfs_rw.c | 3 fs/xfs/xfs_trans.h | 47 + fs/xfs/xfs_trans_buf.c | 62 - fs/xfs/xfs_vnodeops.c | 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: I/O completion handlers must use NOFS allocationsChristoph Hellwig
When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Thomas Neumann <tneumann@users.sourceforge.net> Tested-by: Thomas Neumann <tneumann@users.sourceforge.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-06-10xfs: use generic Posix ACL codeChristoph Hellwig
This patch rips out the XFS ACL handling code and uses the generic fs/posix_acl.c code instead. The ondisk format is of course left unchanged. This also introduces the same ACL caching all other Linux filesystems do by adding pointers to the acl and default acl in struct xfs_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-06-08xfs: kill xfs_qmopsChristoph Hellwig
Kill the quota ops function vector and replace it with direct calls or stubs in the CONFIG_XFS_QUOTA=n case. Make sure we check XFS_IS_QUOTA_RUNNING in the right spots. We can remove the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set otherwise. This brings us back closer to the way this code worked in IRIX and earlier Linux versions, but we keep a lot of the more useful factoring of common code. Eventually we should also kill xfs_qm_bhv.c, but that's left for a later patch. Reduces the size of the source code by about 250 lines and the size of XFS module by about 1.5 kilobytes with quotas enabled: text data bss dec hex filename 615957 2960 3848 622765 980ad fs/xfs/xfs.o 617231 3152 3848 624231 98667 fs/xfs/xfs.o.old Fallout: - xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects the inode locked and xfs_qm_dqattach which does the locking around it, thus removing XFS_QMOPT_ILOCKED. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-04-06xfs: remove xfs_flush_spaceDave Chinner
The only thing we need to do now when we get an ENOSPC condition during delayed allocation reservation is flush all the other inodes with delalloc blocks on them and retry without EOF preallocation. Remove the unneeded mess that is xfs_flush_space() and just call xfs_flush_inodes() directly from xfs_iomap_write_delay(). Also, change the location of the retry label to avoid trying to do EOF preallocation because we don't want to do that at ENOSPC. This enables us to remove the BMAPI_SYNC flag as it is no longer used. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2009-04-06xfs: make inode flush at ENOSPC synchronousDave Chinner
When we are writing to a single file and hit ENOSPC, we trigger a background flush of the inode and try again. Because we hold page locks and the iolock, the flush won't proceed until after we release these locks. This occurs once we've given up and ENOSPC has been reported. Hence if this one is the only dirty inode in the system, we'll get an ENOSPC prematurely. To fix this, remove the async flush from the allocation routines and move it to the top of the write path where we can do a synchronous flush and retry the write again. Only retry once as a second ENOSPC indicates that we really are ENOSPC. This avoids a page cache deadlock when trying to do this flush synchronously in the allocation layer that was identified by Mikulas Patocka. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2009-04-06xfs: use xfs_sync_inodes() for device flushingDave Chinner
Currently xfs_device_flush calls sync_blockdev() which is a no-op for XFS as all it's metadata is held in a different address to the one sync_blockdev() works on. Call xfs_sync_inodes() instead to flush all the delayed allocation blocks out. To do this as efficiently as possible, do it via two passes - one to do an async flush of all the dirty blocks and a second to wait for all the IO to complete. This requires some modification to the xfs-sync_inodes_ag() flush code to do efficiently. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2009-01-16[XFS] Remove the rest of the macro-to-function indirections.Eric Sandeen
Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22[XFS] Fix speculative allocation beyond eofLachlan McIlroy
Speculative allocation beyond eof doesn't work properly. It was broken some time ago after a code cleanup that moved what is now xfs_iomap_eof_align_last_fsb() and xfs_iomap_eof_want_preallocate() out of xfs_iomap_write_delay() into separate functions. The code used to use the current file size in various checks but got changed to be max(file_size, i_new_size). Since i_new_size is the result of 'offset + count' then in xfs_iomap_eof_want_preallocate() the check for '(offset + count) <= isize' will always be true. ie if 'offset + count' is > ip->i_size then isize will be i_new_size and equal to 'offset + count'. This change fixes all the places that used to use the current file size. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-07-28[XFS] use minleft when allocating in xfs_bmbt_split()Lachlan McIlroy
The bmap btree split code relies on a previous data extent allocation (from xfs_bmap_btalloc()) to find an AG that has sufficient space to perform a full btree split, when inserting the extent. When converting unwritten extents we don't allocate a data extent so a btree split will be the first allocation. In this case we need to set minleft so the allocator will pick an AG that has space to complete the split(s). SGI-PV: 983338 SGI-Modid: xfs-linux-melb:xfs-kern:31357a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
2008-04-29[XFS] Fix check for block zero access in xfs_write_iomap_allocate()David Chinner
The check for block zero access should be done on non-realtime inodes. Fix the logic error in xfs_write_iomap_allocate(), and simplify the logic on all checks for block zero access in xfs_iomap.c SGI-PV: 980888 SGI-Modid: xfs-linux-melb:xfs-kern:30998a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-29[XFS] shrink mrlock_tChristoph Hellwig
The writer field is not needed for non_DEBU builds so remove it. While we're at i also clean up the interface for is locked asserts to go through and xfs_iget.c helper with an interface like the xfs_ilock routines to isolated the XFS codebase from mrlock internals. That way we can kill mrlock_t entirely once rw_semaphores grow an islocked facility. Also remove unused flags to the ilock family of functions. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30902a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Catch errors returned from xfs_bmap_last_offset().David Chinner
xfs_bmap_last_offset() can fail and return an error. xfs_iomap_write_allocate() fails to detect and propagate the error. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30802a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] optimize XFS_IS_REALTIME_INODE w/o realtime configEric Sandeen
Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if CONFIG_XFS_RT is off. This should be safe because mount checks in xfs_rtmount_init: so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be encountered after that. Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space, presumeably gcc can optimize around the various "if (0)" type checks: xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8 xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8 <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4 xfs_qm_vop_chown_reserve -4 SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30014a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Fix transaction overrun during writeback.David Chinner
Prevent transaction overrun in xfs_iomap_write_allocate() if we race with a truncate that overlaps the delalloc range we were planning to allocate. If we race, we may allocate into a hole and that requires block allocation. At this point in time we don't have a reservation for block allocation (apart from metadata blocks) and so allocating into a hole rather than a delalloc region results in overflowing the transaction block reservation. Fix it by only allowing a single extent to be allocated at a time. SGI-PV: 972757 SGI-Modid: xfs-linux-melb:xfs-kern:30005a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] kill xfs_iocore_tChristoph Hellwig
xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] kill unnessecary ioops indirectionLachlan McIlroy
Currently there is an indirection called ioops in the XFS data I/O path. Various functions are called by functions pointers, but there is no coherence in what this is for, and of course for XFS itself it's entirely unused. This patch removes it instead and significantly reduces source and binary size of XFS while making maintaince easier. SGI-PV: 970841 SGI-Modid: xfs-linux-melb:xfs-kern:29737a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] kill BMAPI_UNWRITTENChristoph Hellwig
There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because it has nothing in common with the other cases. Instead check for the shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to xfs_iomap_write_unwritten (which should be renamed to something more sensible one day) SGI-PV: 970241 SGI-Modid: xfs-linux-melb:xfs-kern:29681a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] kill BMAPI_DEVICEChristoph Hellwig
There is no reason to go into the iomap machinery just to get the right block device for an inode. Instead look at the realtime flag in the inode and grab the right device from the mount structure. I created a new helper, xfs_find_bdev_for_inode instead of opencoding it because I plan to use it in other places in the future. SGI-PV: 970240 SGI-Modid: xfs-linux-melb:xfs-kern:29680a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] Kill unused IOMAP_EOF flagChristoph Hellwig
SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29705a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-07-14[XFS] Cleanup inode extent size hint extractionDavid Chinner
SGI-PV: 966004 SGI-Modid: xfs-linux-melb:xfs-kern:28866a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-07-14[XFS] Prevent ENOSPC from aborting transactions that need to succeedDavid Chinner
During delayed allocation extent conversion or unwritten extent conversion, we need to reserve some blocks for transactions reservations. We need to reserve these blocks in case a btree split occurs and we need to allocate some blocks. Unfortunately, we've only ever reserved the number of data blocks we are allocating, so in both the unwritten and delalloc case we can get ENOSPC to the transaction reservation. This is bad because in both cases we cannot report the failure to the writing application. The fix is two-fold: 1 - leverage the reserved block infrastructure XFS already has to reserve a small pool of blocks by default to allow specially marked transactions to dip into when we are at ENOSPC. Default setting is min(5%, 1024 blocks). 2 - convert critical transaction reservations to be allowed to dip into this pool. Spots changed are delalloc conversion, unwritten extent conversion and growing a filesystem at ENOSPC. This also allows growing the filesytsem to succeed at ENOSPC. SGI-PV: 964468 SGI-Modid: xfs-linux-melb:xfs-kern:28865a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] Fix to prevent the notorious 'NULL files' problem after a crash.Lachlan McIlroy
The problem that has been addressed is that of synchronising updates of the file size with writes that extend a file. Without the fix the update of a file's size, as a result of a write beyond eof, is independent of when the cached data is flushed to disk. Often the file size update would be written to the filesystem log before the data is flushed to disk. When a system crashes between these two events and the filesystem log is replayed on mount the file's size will be set but since the contents never made it to disk the file is full of holes. If some of the cached data was flushed to disk then it may just be a section of the file at the end that has holes. There are existing fixes to help alleviate this problem, particularly in the case where a file has been truncated, that force cached data to be flushed to disk when the file is closed. If the system crashes while the file(s) are still open then this flushing will never occur. The fix that we have implemented is to introduce a second file size, called the in-memory file size, that represents the current file size as viewed by the user. The existing file size, called the on-disk file size, is the one that get's written to the filesystem log and we only update it when it is safe to do so. When we write to a file beyond eof we only update the in- memory file size in the write operation. Later when the I/O operation, that flushes the cached data to disk completes, an I/O completion routine will update the on-disk file size. The on-disk file size will be updated to the maximum offset of the I/O or to the value of the in-memory file size if the I/O includes eof. SGI-PV: 958522 SGI-Modid: xfs-linux-melb:xfs-kern:28322a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] The last argument "lsn" of xfs_trans_commit() is always called withEric Sandeen
NULL. Patch provided by Eric Sandeen. SGI-PV: 961693 SGI-Modid: xfs-linux-melb:xfs-kern:28199a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>