summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_inode.c
AgeCommit message (Collapse)Author
2008-08-14CRED: Introduce credential access wrappersDavid Howells
The patches that are intended to introduce copy-on-write credentials for 2.6.28 require abstraction of access to some fields of the task structure, particularly for the case of one task accessing another's credentials where RCU will have to be observed. Introduced here are trivial no-op versions of the desired accessors for current and other tasks so that other subsystems can start to be converted over more easily. Wrappers are introduced into a new header (linux/cred.h) for UID/GID, EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed user_struct. These wrappers are macros because the ordering between header files mitigates against making them inline functions. linux/cred.h is #included from linux/sched.h. Further, XFS is modified such that it no longer defines and uses parameterised versions of current_fs[ug]id(), thus getting rid of the namespace collision otherwise incurred. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-08-13[XFS] Use KM_NOFS for debug trace buffersLachlan McIlroy
Use KM_NOFS to prevent recursion back into the filesystem which can cause deadlocks. In the case of xfs_iread() we hold the lock on the inode cluster buffer while allocating memory for the trace buffers. If we recurse back into XFS to flush data that may require a transaction to allocate extents which needs log space. This can deadlock with the xfsaild thread which can't push the tail of the log because it is trying to get the inode cluster buffer lock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31838a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-08-13[XFS] update timestamp in xfs_ialloc manuallyChristoph Hellwig
In xfs_ialloc we just want to set all timestamps to the current time. We don't need to mark the inode dirty like xfs_ichgtime does, and we don't need nor want the opimizations in xfs_ichgtime that I will introduce in the next patch. So just opencode the timestamp update in xfs_ialloc, and remove the new unused XFS_ICHGTIME_ACC case in xfs_ichgtime. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31825a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] replace inode flush semaphore with a completionDavid Chinner
Use the new completion flush code to implement the inode flush lock. Removes one of the final users of semaphores in the XFS code base. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31817a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] use get_unaligned_* helpersHarvey Harrison
SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31813a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] sanitize xfs_initialize_vnodeChristoph Hellwig
Sanitize setting up the Linux indode. Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now because that's the only place it needs to be done, xfs_initialize_vnode is renamed to xfs_setup_inode and loses all superflous paramaters. The check for I_NEW is removed because it always is true and the di_mode check moves into xfs_iget_core because it's only needed there. xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode and the whole things is moved into xfs_iops.c where it belongs. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31782a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] kill bhv_vnode_tChristoph Hellwig
All remaining bhv_vnode_t instance are in code that's more or less Linux specific. (Well, for xfs_acl.c that could be argued, but that code is on the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the whole tree. We can clean up variable naming and some useless helpers later. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31781a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] remove some easy bhv_vnode_t instancesChristoph Hellwig
In various places we can just move a VFS_I call into the argument list of called functions/macros instead of having a local bhv_vnode_t. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31776a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] Use KM_NOFS for incore inode extent tree allocation V2David Chinner
If we allow incore extent tree allocations to recurse into the filesystem under memory pressure, new delayed allocations through xfs_iomap_write_delay() can deadlock on themselves if memory reclaim tries to write back dirty pages from that inode. It will deadlock in xfs_iomap_write_allocate() trying to take the ilock we already hold. This can also show up as complex ABBA deadlocks when multiple threads are triggering memory reclaim when trying to allocate extents. The main cause of this is the fact that delayed allocation is not done in a transaction, so KM_NOFS is not automatically added to the allocations to prevent this recursion. Mark all allocations done for the incore inode extent tree as KM_NOFS to ensure they never recurse back into the filesystem. Version 2: o KM_NOFS implies KM_SLEEP, so just use KM_NOFS SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31726a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] Kill shouty XFS_ITOV() macroDavid Chinner
Replace XFS_ITOV() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31724a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13[XFS] kill shouty XFS_ITOV_NULL macroDavid Chinner
Replace XFS_ITOV_NULL() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31722a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-07-28[XFS] fix extent corruption in xfs_iext_irec_compact_full()Lachlan McIlroy
This function is used to compact the indirect extent list by moving extents from one page to the previous to fill them up. After we move some extents to an earlier page we need to shuffle the remaining extents to the start of the page. The actual bug here is the second argument to memmove() needs to index past the extents, that were copied to the previous page, and move the remaining extents. For pages that are already full (ie ext_avail == 0) the compaction code has no net effect so don't do it. SGI-PV: 983337 SGI-Modid: xfs-linux-melb:xfs-kern:31332a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-07-28[XFS] make inode reclaim wait for log I/O to completeLachlan McIlroy
During a forced shutdown a xfs inode can be destroyed before log I/O involving that inode is complete. We need to wait for the inode to be unpinned before tearing it down. Version 2 cleans up the code a bit by relying on xfs_iflush() to do the unpinning and forced shutdown check. SGI-PV: 981240 SGI-Modid: xfs-linux-melb:xfs-kern:31326a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
2008-07-28[XFS] kill xfs_igrow_start and xfs_igrow_finishChristoph Hellwig
xfs_igrow_start just expands to xfs_zero_eof with two asserts that are useless in the context of the only caller and some rather confusing comments. xfs_igrow_finish is just a few lines of code decorated again with useless asserts and confusing comments. Just kill those two and merge them into xfs_setattr. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31186a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-07-28[XFS] Remove unused arg from kmem_free()Denys Vlasenko
kmem_free() function takes (ptr, size) arguments but doesn't actually use second one. This patch removes size argument from all callsites. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31050a Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-05-23[XFS] Fix inode list allocation size in writeback.David Chinner
We only need to allocate space for the number of inodes in the cluster when writing back inodes, not every byte in the inode cluster. This reduces the amount of memory needing to be allocated to 256 bytes instead of 64k. SGI-PV: 981949 SGI-Modid: xfs-linux-melb:xfs-kern:31182a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-05-23[XFS] Don't allow memory reclaim to wait on the filesystem in inodeDavid Chinner
writeback If we allow memory reclaim to wait on the pages under writeback in inode cluster writeback we could deadlock because we are currently holding the ILOCK on the initial writeback inode which is needed in data I/O completion to change the file size or do unwritten extent conversion before the pages are taken out of writeback state. SGI-PV: 981091 SGI-Modid: xfs-linux-melb:xfs-kern:31015a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-29[XFS] shrink mrlock_tChristoph Hellwig
The writer field is not needed for non_DEBU builds so remove it. While we're at i also clean up the interface for is locked asserts to go through and xfs_iget.c helper with an interface like the xfs_ilock routines to isolated the XFS codebase from mrlock internals. That way we can kill mrlock_t entirely once rw_semaphores grow an islocked facility. Also remove unused flags to the ilock family of functions. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30902a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Ensure the inode is joined in xfs_itruncate_finishDavid Chinner
On success, we still need to join the inode to the current transaction in xfs_itruncate_finish(). Fixes regression from error handling changes. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30845a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] xfs_iflush_fork() never returns an error.David Chinner
xfs_iflush_fork() never returns an error. Mark it void and clean up the code calling it that checks for errors. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30827a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Ensure xfs_bawrite() errors are checked.David Chinner
xfs_bawrite() can return immediate error status on async writes. Unlike xfsbdstrat() we don't ever check the error on the buffer after the call, so we currently do not catch errors at all here. Ensure we catch and propagate or warn to the syslog about up-front async write errors. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30824a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Propagate errors from xfs_trans_commit().David Chinner
xfs_trans_commit() can return errors when there are problems in the transaction subsystem. They are indicative that the entire transaction may be incomplete, and hence the error should be propagated as there is a good possibility that there is something fatally wrong in the filesystem. Catch and propagate or warn about commit errors in the places where they are currently ignored. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30795a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Use xfs_inode_clean() in more placesDavid Chinner
Remove open coded checks for the whether the inode is clean and replace them with an inlined function. SGI-PV: 977461 SGI-Modid: xfs-linux-melb:xfs-kern:30503a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Remove the xfs_icluster structureDavid Chinner
Remove the xfs_icluster structure and replace with a radix tree lookup. We don't need to keep a list of inodes in each cluster around anymore as we can look them up quickly when we need to. The only time we need to do this now is during inode writeback. Factor the inode cluster writeback code out of xfs_iflush and convert it to use radix_tree_gang_lookup() instead of walking a list of inodes built when we first read in the inodes. This remove 3 pointers from each xfs_inode structure and the xfs_icluster structure per inode cluster. Hence we reduce the cache footprint of the xfs_inodes by between 5-10% depending on cluster sparseness. To be truly efficient we need a radix_tree_gang_lookup_range() call to stop searching once we are past the end of the cluster instead of trying to find a full cluster's worth of inodes. Before (ia64): $ cat /sys/slab/xfs_inode/object_size 536 After: $ cat /sys/slab/xfs_inode/object_size 512 SGI-PV: 977460 SGI-Modid: xfs-linux-melb:xfs-kern:30502a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Don't block pdflush when writing back inodesDavid Chinner
When pdflush is writing back inodes, it can get stuck on inode cluster buffers that are currently under I/O. This occurs when we write data to multiple inodes in the same inode cluster at the same time. Effectively, delayed allocation marks the inode dirty during the data writeback. Hence if the inode cluster was flushed during the writeback of the first inode, the writeback of the second inode will block waiting for the inode cluster write to complete before writing it again for the newly dirtied inode. Basically, we want to avoid this from happening so we don't block pdflush and slow down all of writeback. Hence we introduce a non-blocking async inode flush flag that pdflush uses. If this flag is set, we use non-blocking operations (e.g. try locks) whereever we can to avoid blocking or extra I/O being issued. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30501a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Factor xfs_itobp() and xfs_inotobp().David Chinner
The only difference between the functions is one passes an inode for the lookup, the other passes an inode number. However, they don't do the same validity checking or set all the same state on the buffer that is returned yet they should. Factor the functions into a common implementation. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30500a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10[XFS] remove shouting-indirection macros from xfs_sb.hEric Sandeen
Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] use generic_permissionChristoph Hellwig
Now that all direct caller of xfs_iaccess are gone we can kill xfs_iaccess and xfs_access and just use generic_permission with a check_acl callback. This is required for the per-mount read-only patchset in -mm to work properly with XFS. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30370a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.Christoph Hellwig
Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an xfs_dinode that embedds a xfs_dinode_core one will have to do endian swapping while the other doesn't. Instead of having the current mess with the CFORK macros that have byteswapping and non-byteswapping version (which are inconsistantly named while we're at it) just define each family of the macros to stand by itself and simplify the whole matter. A few direct references to the CFORK variants were cleaned up to use IFORK or DFORK to make this possible. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30163a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Use kernel-supplied "roundup_pow_of_two" for simplicityRobert P. J. Day
SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30098a Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] optimize XFS_IS_REALTIME_INODE w/o realtime configEric Sandeen
Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if CONFIG_XFS_RT is off. This should be safe because mount checks in xfs_rtmount_init: so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be encountered after that. Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space, presumeably gcc can optimize around the various "if (0)" type checks: xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8 xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8 <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4 xfs_qm_vop_chown_reserve -4 SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30014a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Fix inode allocation latencyDavid Chinner
The log force added in xfs_iget_core() has been a performance issue since it was introduced for tight loops that allocate then unlink a single file. under heavy writeback, this can introduce unnecessary latency due tothe log I/o getting stuck behind bulk data writes. Fix this latency problem by avoinding the need for the log force by moving the place we mark linux inode dirty to the transaction commit rather than on transaction completion. This also closes a potential hole in the sync code where a linux inode is not dirty between the time it is modified and the time the log buffer has been written to disk. SGI-PV: 972753 SGI-Modid: xfs-linux-melb:xfs-kern:30007a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Make xfs_bulkstat() to report unlinked but referenced inodesVlad Apostolov
We need xfs_bulkstat() to report inode stat for inodes with link count zero but reference count non zero. The fix here: http://oss.sgi.com/archives/xfs/2007-09/msg00266.html changed this behavior and made xfs_bulkstat() to filter all unlinked inodes including those that are not destroyed yet but held by reference. The attached patch returns back to the original behavior by marking the on-disk inode buffer "dirty" when di_mode is cleared (at that time both inode link and reference counter are zero). SGI-PV: 972004 SGI-Modid: xfs-linux-melb:xfs-kern:29914a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] kill xfs_iocore_tChristoph Hellwig
xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] Unwrap AIL_LOCKDonald Douwsma
SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29739a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] kill unnessecary ioops indirectionLachlan McIlroy
Currently there is an indirection called ioops in the XFS data I/O path. Various functions are called by functions pointers, but there is no coherence in what this is for, and of course for XFS itself it's entirely unused. This patch removes it instead and significantly reduces source and binary size of XFS while making maintaince easier. SGI-PV: 970841 SGI-Modid: xfs-linux-melb:xfs-kern:29737a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07[XFS] clean up vnode/inode tracingLachlan McIlroy
Simplify vnode tracing calls by embedding function name & return addr in the calling macro. Also do a lot of vnode->inode renaming for consistency, while we're at it. SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29650a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-12-18[XFS] Don't wait for pending I/Os when purging blocks beyond eof.Lachlan McIlroy
On last close of a file we purge blocks beyond eof. The same code is used when we truncate the file size down. In this case we need to wait for any pending I/Os for dirty pages beyond the new eof. For the last close case we are not changing the file size and therefore do not need to wait for any I/Os to complete. This fixes a performance bottleneck where writes into the page cache and cache flushes can become mutually exclusive. SGI-PV: 964002 SGI-Modid: xfs-linux-melb:xfs-kern:30220a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Peter Leckie <pleckie@sgi.com>
2007-10-16[XFS] get_bulkall() could return incorrect inode stateVlad Apostolov
In the following scenario xfs_bulkstat() returns incorrect stale inode state: 1. File_A is created and its inode synced to disk. 2. File_A is unlinked and doesn't exist anymore. 3. Filesystem sync is invoked. 4. File_B is created. File_B happens to reclaim File_A's inode. 5. xfs_bulkstat() is called and detects File_B but reports the incorrect File_A inode state. Explanation for the incorrect inode state is that inodes are not immediately synced on file create for performance reasons. This leaves the on-disk inode buffer uninitialized (or with old state from a previous generation inode) and this is what xfs_bulkstat() would report. The patch marks the on-disk inode buffer "dirty" on unlink. When the inode is reclaimed (by a new file create), xfs_bulkstat() would filter this inode by the "dirty" mark. Once the inode is flushed to disk, the on-disk buffer "dirty" mark is automatically removed and a following xfs_bulkstat() would return the correct inode state. Marking the on-disk inode buffer "dirty" on unlink is achieved by setting the on-disk di_nlink field to 0. Note that the in-core di_nlink has already been set to 0 and a corresponding transaction logged by xfs_droplink(). This is an exception from the rule that any on-disk inode buffer changes has to be followed by a disk write (inode flush). Synchronizing the in-core to on-disk di_nlink values in advance (before the actual inode flush to disk) should be fine in this case because the inode is already unlinked and it would never change its di_nlink again for this inode generation. SGI-PV: 970842 SGI-Modid: xfs-linux-melb:xfs-kern:29757a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mark Goodwin <markgw@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] kill the vfs_flags member in struct bhv_vfsChristoph Hellwig
All flags are added to xfs_mount's m_flag instead. Note that the 32bit inode flag was duplicated in both of them, but only cleared in the mount when it was not nessecary due to the filesystem beeing small enough. Two flags are still required here - one to indicate the mount option setting, and one to indicate if it applies or not. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29507a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] call common xfs vfs-level helpers directly and remove vfs operationsChristoph Hellwig
Also remove the now dead behavior code. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29505a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] move v_trace from bhv_vnode to xfs_inodeChristoph Hellwig
struct bhv_vnode is on it's way out, so move the trace buffer to the XFS inode. Note that this makes the tracing macros rather misnamed, but this kind of fallout will be fixed up incrementally later on. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29498a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] move v_iocount from bhv_vnode to xfs_inodeChristoph Hellwig
struct bhv_vnode is on it's way out, so move the I/O count to the XFS inode. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29497a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] kill v_vfsp member from struct bhv_vnodeChristoph Hellwig
We can easily get at the vfsp through the super_block but it will soon be gone anyway. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29494a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16[XFS] call common xfs vnode-level helpers directly and remove vnode operationsChristoph Hellwig
SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29493a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-15[XFS] Radix tree based inode cachingDavid Chinner
One of the perpetual scaling problems XFS has is indexing it's incore inodes. We currently uses hashes and the default hash sizes chosen can only ever be a tradeoff between memory consumption and the maximum realistic size of the cache. As a result, anyone who has millions of inodes cached on a filesystem needs to tunes the size of the cache via the ihashsize mount option to allow decent scalability with inode cache operations. A further problem is the separate inode cluster hash, whose size is based on the ihashsize but is smaller, and so under certain conditions (sparse cluster cache population) this can become a limitation long before the inode hash is causing issues. The following patchset removes the inode hash and cluster hash and replaces them with radix trees to avoid the scalability limitations of the hashes. It also reduces the size of the inodes by 3 pointers.... SGI-PV: 969561 SGI-Modid: xfs-linux-melb:xfs-kern:29481a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-15[XFS] dinode endianess annotationsChristoph Hellwig
Biggest bit is duplicating the dinode structure so we have one annotated for native endianess and one for disk endianess. The other significant change is that xfs_xlate_dinode_core is split into one helper per direction to allow for proper annotations, everything else is trivial. As a sidenode splitting out the incore dinode means we can move it into xfs_inode.h in a later patch and severely improving on the include hell in xfs. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29476a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-15[XFS] move linux/log2.h header to xfs_linux.hEric Sandeen
Generally we try not to directly include linux header files in core xfs code; xfs_linux.h is the spot for that. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29326a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-15[XFS] endianess annotations for xfs_bmbt_rec_tChristoph Hellwig
SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29321a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-15[XFS] split ondisk vs incore versions of xfs_bmbt_rec_tChristoph Hellwig
currently xfs_bmbt_rec_t is used both for ondisk extents as well as host-endian ones. This patch adds a new xfs_bmbt_rec_host_t for the native endian ones and cleans up the fallout. There have been various endianess issues in the tracing / debug printf code that are fixed by this patch. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29318a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>