summaryrefslogtreecommitdiffstats
path: root/fs/xfs
AgeCommit message (Collapse)Author
2009-12-11xfs: remove IO_ISAIOChristoph Hellwig
We set the IO_ISAIO flag for all read/write I/O since early Linux 2.6.x. Remove it as it has lost it's purpose long ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: Wrapped journal record corruption on read at recoveryAndy Poling
Summary of problem: If a journal record wraps at the physical end of the journal, it has to be read in two parts in xlog_do_recovery_pass(): a read at the physical end and a read at the physical beginning. If xlog_bread() has to re-align the first read, the second read request does not take that re-alignment into account. If the first read was re-aligned, the second read over-writes the end of the data from the first read, effectively corrupting it. This can happen either when reading the record header or reading the record data. The first sanity check in xlog_recover_process_data() is to check for a valid clientid, so that is the error reported. Summary of fix: If there was a first read at the physical end, XFS_BUF_PTR() returns where the data was requested to begin. Conversely, because it is the result of xlog_align(), offset indicates where the requested data for the first read actually begins - whether or not xlog_bread() has re-aligned it. Using offset as the base for the calculation of where to place the second read data ensures that it will be correctly placed immediately following the data from the first read instead of sometimes over-writing the end of it. The attached patch has resolved the reported problem of occasional inability to recover the journal (reporting "bad clientid"). Signed-off-by: Andy Poling <andy@realbig.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: cleanup data end I/O handlersChristoph Hellwig
Currently we have different end I/O handlers for read vs the different types of write I/O. But they are all very similar so we could just use one with a few conditionals and reduce code size a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: use WRITE_SYNC_PLUG for synchronous writeoutChristoph Hellwig
The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for synchronous writeout. Right now I can't see any changes in performance numbers with this, but we're getting some beating for not using it, and the knowledge definitely could help the block code to make better decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: reset the i_iolock lock class in the reclaim pathChristoph Hellwig
The iolock is used for protecting reads, writes and block truncates against each other. We have two classes of callers, the first one is induced by a file operation and requires a reference to the inode be held and not dropped after the operation is done: - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read, xfs_splice_write and xfs_setattr are all implementations of VFS methods that require a live inode - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the same is true - xfs_truncate_file is only called on quota inodes just returned from xfs_iget - xfs_sync_inode_data does the lock just after an igrab() - xfs_filestream_associate and xfs_filestream_new_ag take the iolock on the parent inode of an inode which by VFS rules must be referenced And we have various calls to truncate blocks past EOF or the whole file when dropping the last reference to an inode. Unfortunately lockdep complains when we do memory allocations that can recurse into the filesystem in the first class because the second class happens to take the same lock. To avoid this re-init the iolock in the beginning of xfs_fs_clear_inode to get a new lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: I/O completion handlers must use NOFS allocationsChristoph Hellwig
When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Thomas Neumann <tneumann@users.sourceforge.net> Tested-by: Thomas Neumann <tneumann@users.sourceforge.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocksChristoph Hellwig
When xfs_free_eofblocks is called from ->release the VM might already hold the mmap_sem, but in the write path we take the iolock before taking the mmap_sem in the generic write code. Switch xfs_free_eofblocks to only trylock the iolock if called from ->release and skip trimming the prellocated blocks in that case. We'll still free them later on the final iput. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: simplify inode teardownChristoph Hellwig
Currently the reclaim code for the case where we don't reclaim the final reclaim is overly complicated. We know that the inode is clean but instead of just directly reclaiming the clean inode we go through the whole process of marking the inode reclaimable just to directly reclaim it from the calling context. Besides being overly complicated this introduces a race where iget could recycle an inode between marked reclaimable and actually being reclaimed leading to panics. This patch gets rid of the existing reclaim path, and replaces it with a simple call to xfs_ireclaim if the inode was clean. While we're at it we also use the slightly more lax xfs_inode_clean check we'd use later to determine if we need to flush the inode here. Finally get rid of xfs_reclaim function and place the remaining small bits of reclaim code directly into xfs_fs_destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Reported-by: Tommy van Leeuwen <tommy@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17xfs: copy li_lsn before dropping AIL lockNathaniel W. Turner
Access to log items on the AIL is generally protected by m_ail_lock; this is particularly needed when we're getting or setting the 64-bit li_lsn on a 32-bit platform. This patch fixes a couple places where we were accessing the log item after dropping the AIL lock on 32-bit machines. This can result in a partially-zeroed log->l_tail_lsn if xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at least some cases, this can leave the l_tail_lsn with a zero cycle number, which means xlog_space_left will think the log is full (unless CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to processes stuck forever in xlog_grant_log_space. Thanks to Adrian VanderSpek for first spotting the race potential and to Dave Chinner for debug assistance. Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17XFS bug in log recover with quota (bugzilla id 855)Jan Rekorajski
Hi, I was hit by a bug in linux 2.6.31 when XFS is not able to recover the log after a crash if fs was mounted with quotas. Gory details in XFS bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855. It looks like wrong struct is used in buffer length check, and the following patch should fix the problem. xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes long, and this is exactly what I see in system logs - "XFS: dquot too small (104) in xlog_recover_do_dquot_trans." Signed-off-by: Jan Rekorajski <baggins@sith.mimuw.edu.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-31Merge branch 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfsLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs: xfs: fix xfs_quota remove error xfs: free temporary cursor in xfs_dialloc
2009-10-30xfs: fix xfs_quota remove errorRyota Yamauchi
The xfs_quota returns ENOSYS when remove command is executed. Reproducable with following steps. # mount -t xfs -o uquota /dev/sda7 /mnt/mp1 # xfs_quota -x -c off -c remove XFS_QUOTARM: Function not implemented. The remove command is allowed during quotaoff, but xfs_fs_set_xstate() checks whether quota is running, and it leads to ENOSYS. To solve this problem, add a check for X_QUOTARM. Signed-off-by: Ryota Yamauchi <r-yamauchi@vf.jp.nec.com> Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-10-30xfs: free temporary cursor in xfs_diallocEric Sandeen
Commit bd169565993b39b9b4b102cdac8b13e0a259ce2f seems to have a slight regression where this code path: if (!--searchdistance) { /* * Not in range - save last search * location and allocate a new inode */ ... goto newino; } doesn't free the temporary cursor (tcur) that got dup'd in this function. This leaks an item in the xfs_btree_cur zone, and it's caught on module unload: =========================================================== BUG xfs_btree_cur: Objects remaining on kmem_cache_close() ----------------------------------------------------------- It seems like maybe a single free at the end of the function might be cleaner, but for now put a del_cursor right in this code block similar to the handling in the rest of the function. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-10-29Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix double IRELE in xfs_dqrele_inode
2009-10-13Merge branch 'master' of ssh://oss.sgi.com/oss/git/xfs/xfs into for-linusAlex Elder
2009-10-13xfs: fix double IRELE in xfs_dqrele_inodeChristoph Hellwig
xfs_dqrele_inode calls xfs_iput to release the ilock and a reference and then also calls IRELE which does a second decrement of the reference count. This leads to a premature freeing of inodes when quotas were turned off while the filesystem was mounted. Thanks to Utako Kusaka for reporting the bug and provinding a good testcase. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-09Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: stop calling filemap_fdatawait inside ->fsync fix readahead calculations in xfs_dir2_leaf_getdents() xfs: make sure xfs_sync_fsdata covers the log xfs: mark inodes dirty before issuing I/O xfs: cleanup ->sync_fs xfs: fix xfs_quiesce_data xfs: implement ->dirty_inode to fix timestamp handling
2009-10-08Merge branch 'master' into for-linusAlex Elder
2009-10-08xfs: stop calling filemap_fdatawait inside ->fsyncChristoph Hellwig
Now that the VFS actually waits for the data I/O to complete before calling into ->fsync we can stop doing it ourselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08fix readahead calculations in xfs_dir2_leaf_getdents()Eric Sandeen
This is for bug #850, http://oss.sgi.com/bugzilla/show_bug.cgi?id=850 XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31 The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because xfs_bmapi() ASSERTs that it has been asked for at least one map, and it was getting 0. The root cause is that our guesstimated "bufsize" from xfs_file_readdir was fairly small, and the bufsize -= length; in the loop was going negative - except bufsize is a size_t, so it was wrapping to a very large number. Then when we did ra_want = howmany(bufsize + mp->m_dirblksize, mp->m_sb.sb_blocksize) - 1; with that very large number, the (int) ra_want was coming out negative, and a subsequent compare: if (1 + ra_want > map_blocks ... was coming out -true- (negative int compare w/ uint) and we went back to xfs_bmapi() for more, even though we did not need more, and asked for 0 maps, and hit the ASSERT. We have kind of a type mess here, but just keeping bufsize from going negative is probably sufficient to avoid the problem. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: make sure xfs_sync_fsdata covers the logDave Chinner
We want to always cover the log after writing out the superblock, and in case of a synchronous writeout make sure we actually wait for the log to be covered. That way a filesystem that has been sync()ed can be considered clean by log recovery. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: mark inodes dirty before issuing I/ODave Chinner
To make sure they get properly waited on in sync when I/O is in flight and we latter need to update the inode size. Requires a new helper to check if an ioend structure is beyond the current EOF. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: cleanup ->sync_fsChristoph Hellwig
Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case as that is just an optional first pass and the superblock will be written back properly in the next call with wait = 1. Instead perform an opportunistic quota writeback to have less work later. Also remove the freeze special case as we do a proper wait = 1 call in the freeze code anyway. Also rename the function to xfs_fs_sync_fs to match the normal naming convention, update comments and avoid calling into the laptop_mode logic on an error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: fix xfs_quiesce_dataDave Chinner
We need to do a synchronous xfs_sync_fsdata to make sure the superblock actually is on disk when we return. Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular flag is never checked. Move xfs_filestream_flush call later to only release inodes after they have been written out. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: implement ->dirty_inode to fix timestamp handlingChristoph Hellwig
This is picking up on Felix's repost of Dave's patch to implement a .dirty_inode method. We really need this notification because the VFS keeps writing directly into the inode structure instead of going through methods to update this state. In addition to the long-known atime issue we now also have a caller in VM code that updates c/mtime that way for shared writeable mmaps. And I found another one that no one has noticed in practice in the FIFO code. So implement ->dirty_inode to set i_update_core whenever the inode gets externally dirtied, and switch the c/mtime handling to the same scheme we already use for atime (always picking up the value from the Linux inode). Note that this patch also removes the xfs_synchronize_atime call in xfs_reclaim it was superflous as we already synchronize the time when writing the inode via the log (xfs_inode_item_format) or the normal buffers (xfs_iflush_int). In addition also remove the I_CLEAR check before copying the Linux timestamps - now that we always have the Linux inode available we can always use the timestamps in it. Also switch to just using file_update_time for regular reads/writes - that will get us all optimization done to it for free and make sure we notice early when it breaks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-09-27const: mark struct vm_struct_operationsAlexey Dobriyan
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) trivial: fix typo in aic7xxx comment trivial: fix comment typo in drivers/ata/pata_hpt37x.c trivial: typo in kernel-parameters.txt trivial: fix typo in tracing documentation trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c trivial: remove unnecessary semicolons trivial: Fix duplicated word "options" in comment trivial: kbuild: remove extraneous blank line after declaration of usage() trivial: improve help text for mm debug config options trivial: doc: hpfall: accept disk device to unload as argument trivial: doc: hpfall: reduce risk that hpfall can do harm trivial: SubmittingPatches: Fix reference to renumbered step trivial: fix typos "man[ae]g?ment" -> "management" trivial: media/video/cx88: add __init/__exit macros to cx88 drivers trivial: fix typo in CONFIG_DEBUG_FS in gcov doc trivial: fix missing printk space in amd_k7_smp_check trivial: fix typo s/ketymap/keymap/ in comment trivial: fix typo "to to" in multiple files trivial: fix typos in comments s/DGBU/DBGU/ ...
2009-09-22const: mark remaining super_operations constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: make struct super_block::s_qcop constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-21trivial: fix typo "for for" in multiple filesAnand Gadiyar
trivial: fix typo "for for" in multiple files Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-16HWPOISON: Enable .remove_error_page for migration aware file systemsAndi Kleen
Enable removing of corrupted pages through truncation for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs These should cover most server needs. I chose the set of migration aware file systems for this for now, assuming they have been especially audited. But in general it should be safe for all file systems on the data area that support read/write and truncate. Caveat: the hardware error handler does not take i_mutex for now before calling the truncate function. Is that ok? Cc: tytso@mit.edu Cc: hch@infradead.org Cc: mfasheh@suse.com Cc: aia21@cantab.net Cc: hugh.dickins@tiscali.co.uk Cc: swhiteho@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-15Merge branch 'master' of git://oss.sgi.com/xfs/xfs into for-linusAlex Elder
Conflicts: fs/xfs/linux-2.6/xfs_lrw.c
2009-09-15xfs: includecheck fix for fs/xfs/xfs_iops.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: fs/xfs/linux-2.6/xfs_iops.c: xfs_acl.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-09-15xfs: switch to seq_fileAlexey Dobriyan
create_proc_read_entry() is getting deprecated. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-09-14xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()Jan Kara
Christoph Hellwig says that it is enough for XFS to call filemap_write_and_wait_range() instead of sync_page_range() because we do all the metadata syncing when forcing the log. CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-09xfs: use correct log reservation when handling ENOSPC in xfs_createChristoph Hellwig
We added the ENOSPC handling patch in xfs_create just after it got mered with xfs_mkdir. Change the log reservation to the variable for either the create or mkdir value so it does the right thing if get here for creating a directory. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-09-08jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'Linus Torvalds
This avoids an indirect call in the VFS for each path component lookup. Well, at least as long as you own the directory in question, and the ACL check is unnecessary. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-02xfs: xfs_showargs() reports group *and* project quotas enabledAlex Elder
If you enable group or project quotas on an XFS file system, then the mount table presented through /proc/self/mounts erroneously shows that both options are in effect for the file system. The root of the problem is some bad logic in the xfs_showargs() function, which is used to format the file system type-specific options in effect for a file system. The problem originated in this GIT commit: Move platform specific mount option parse out of core XFS code Date: 11/22/07 Author: Dave Chinner SHA1 ID: a67d7c5f5d25d0b13a4dfb182697135b014fa478 For XFS quotas, project and group quota management are mutually exclusive--only one can be in effect at a time. There are two parts to managing quotas: aggregating usage information; and enforcing limits. It is possible to have a quota in effect (aggregating usage) but not enforced. These features are recorded on an XFS mount point using these flags: XFS_PQUOTA_ACCT - Project quotas are aggregated XFS_GQUOTA_ACCT - Group quotas are aggregated XFS_OQUOTA_ENFD - Project/group quotas are enforced The code in error is in fs/xfs/linux-2.6/xfs_super.c: if (mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_OQUOTA_ENFD)) seq_puts(m, "," MNTOPT_PRJQUOTA); else if (mp->m_qflags & XFS_PQUOTA_ACCT) seq_puts(m, "," MNTOPT_PQUOTANOENF); if (mp->m_qflags & (XFS_GQUOTA_ACCT|XFS_OQUOTA_ENFD)) seq_puts(m, "," MNTOPT_GRPQUOTA); else if (mp->m_qflags & XFS_GQUOTA_ACCT) seq_puts(m, "," MNTOPT_GQUOTANOENF); The problem is that XFS_OQUOTA_ENFD will be set in mp->m_qflags if either group or project quotas are enforced, and as a result both MNTOPT_PRJQUOTA and MNTOPT_GRPQUOTA will be shown as mount options. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: un-static xfs_inobt_lookupChristoph Hellwig
xfs_inobt_lookup is also used in xfs_itable.c, remove the STATIC modifier from it's declaration to fix non-debug builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: actually enable the swapext compat handlerChristoph Hellwig
Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: actually enable the swapext compat handlerChristoph Hellwig
Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: simplify xfs_trans_igetChristoph Hellwig
xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the transaction after it is read. Except when the inode already is in the inode cache, in which case it returns the existing locked inode with increment lock recursion counts. Now, no one in the tree every decrements these lock recursion counts, so any user of this gets a potential double unlock when both the original owner of the inode and the xfs_trans_iget caller unlock it. When looking back in a git bisect in the historic XFS tree there was only one place that decremented these counts, xfs_trans_iput. Introduced in commit ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993, and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by Steve Lord in 2003. And as long as it didn't slip through git bisects cracks never actually used in that time frame. A quick audit of the callers of xfs_trans_iget shows that no caller really relies on this behaviour fortunately - xfs_ialloc allows this inode from disk so it must not be there before, and all the RT allocator routines only every add each RT bitmap inode once. In addition to removing lots of code and reducing the size of the inode item this patch also avoids the double inode cache lookup in each create/mkdir/mknod transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: merge fsync and O_SYNC handlingChristoph Hellwig
The guarantees for O_SYNC are exactly the same as the ones we need to make for an fsync call (and given that Linux O_SYNC is O_DSYNC the equivalent is fdadatasync, but we treat both the same in XFS), except with a range data writeout. Jan Kara has started unifying these two path for filesystems using the generic helpers, and I've started to look at XFS. The actual transaction commited by xfs_fsync and xfs_write_sync_logforce has a different transaction number, but actually is exactly the same. We'll only use the fsync transaction going forward. One major difference is that xfs_write_sync_logforce never issues a cache flush unless we commit a transaction causing that as a side-effect, which is an obvious bug in the O_SYNC handling. Second all the locking and i_update_size vs i_update_core changes from 978b7237123d007b9fa983af6e0e2fa8f97f9934 never made it to xfs_write_sync_logforce, so we add them back. To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait call is moved up to xfs_file_fsync, so that we don't wait on the whole file after we already waited for our portion in xfs_write. We'll also use a plain call to filemap_write_and_wait_range instead of the previous sync_page_rang which did it in two steps including an half-hearted inode write out that doesn't help us. Once we're done with this also remove the now useless i_update_size tracking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: speed up free inode searchDave Chinner
Don't search too far - abort if it is outside a certain radius and simply do a linear search for the first free inode. In AGs with a million inodes this can speed up allocation speed by 3-4x. [hch: ported to the new xfs_ialloc.c world order] Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: rationalize xfs_inobt_lookup*Christoph Hellwig
Currenly we have a xfs_inobt_lookup* variant for each comparism direction, and all these get all three fields of the inobt records passed, while the common case is just looking for the inode number and we have only marginally more callers than xfs_inobt_lookup* variants. So opencode a direct call to xfs_btree_lookup for the single case where we need all fields, and replace xfs_inobt_lookup* with a xfs_inobt_looku that just takes the inode number and the direction for all other callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: untangle xfs_diallocChristoph Hellwig
Clarify the control flow in xfs_dialloc. Factor out a helper to go to the next node from the current one and improve the control flow by expanding composite if statements and using gotos. The xfs_ialloc_next_rec helper is borrowed from Dave Chinners dynamic allocation policy patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: factor out debug checks from xfs_dialloc and xfs_difreeDave Chinner
Factor out a common helper from repeated debug checks in xfs_dialloc and xfs_difree. [hch: split out from Dave's dynamic allocation policy patches] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01xfs: improve xfs_inobt_update prototypeChristoph Hellwig
Both callers of xfs_inobt_update have the record in form of a xfs_inobt_rec_incore_t, so just pass a pointer to it instead of the individual variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>