summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2009-06-05ext4: Avoid leaking blocks after a block allocation failureAneesh Kumar K.V
We should add inode to the orphan list in the same transaction as block allocation. This ensures that if we crash after a failed block allocation and before we do a vmtruncate we don't leak block (ie block marked as used in bitmap but not claimed by the inode). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> CC: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-04ext4: Change all super.c messages to print the deviceEric Sandeen
This patch changes ext4 super.c to include the device name with all warning/error messages, by using a new utility function ext4_msg. It's a rather large patch, but very mechanic. I left debug printks alone. This is a straightforward port of a patch which Andi Kleen did for ext3. Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-09ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle()Jan Kara
Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sense. Currently it was set only by ext4_getblk(). Since the parameter has some effect only if create == 1, it is easy to check by grepping through the sources that the three callers which end up calling ext4_getblk() with create == 1 (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-04Revert "block: implement blkdev_readpages"Jens Axboe
This reverts commit db2dbb12dc47a50c7a4c5678f526014063e486f6. It apparently causes problems with partition table read-ahead on archs with large page sizes. Until that problem is diagnosed further, just drop the readpages support on block devices. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-04Btrfs: Fix oops and use after free during space balancingChris Mason
The btrfs allocator uses list_for_each to walk the available block groups when searching for free blocks. It starts off with a hint to help find the best block group for a given allocation. The hint is resolved into a block group, but we don't properly check to make sure the block group we find isn't in the middle of being freed due to filesystem shrinking or balancing. If it is being freed, the list pointers in it are bogus and can't be trusted. But, the code happily goes along and uses them in the list_for_each loop, leading to all kinds of fun. The fix used here is to check to make sure the block group we find really is on the list before we use it. list_del_init is used when removing it from the list, so we can do a proper check. The allocation clustering code has a similar bug where it will trust the block group in the current free space cluster. If our allocation flags have changed (going from single spindle dup to raid1 for example) because the drives in the FS have changed, we're not allowed to use the old block group any more. The fix used here is to check the current cluster against the current allocation flags. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-06-04Btrfs: set device->total_disk_bytes when adding new deviceYan Zheng
It was not being properly initialized, and so the size saved to disk was not correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-06-04ext4: Don't look at buffer_heads outside i_size.Aneesh Kumar K.V
Buffer heads outside i_size will be unmapped. So when we are doing "walk_page_buffers" limit ourself to i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Josef Bacik <jbacik@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> ----
2009-07-05ext4: Fix goal inum check in the inode allocatorJohann Lombardi
The goal inode is specificed by inode number which belongs to [1; s_inodes_count]. Signed-off-by: Johann Lombardi <johann@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-08ext4: fix no journal corruption with locale-genTheodore Ts'o
If there is no journal, ext4_should_writeback_data() should return TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of delayed allocation; otherwise ext4_journaled_aops gets used by default, which doesn't handle delayed allocation properly. The advantage of using ext4_should_writeback_data() approach is that it should handle nobh better as well. Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh Kumar for suggesting this approach. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: Calculate required journal credits for inserting an extent properlyAneesh Kumar K.V
When we have space in the extent tree leaf node we should be able to insert the extent with much less journal credits. The code was doing proper calculation but missed a return statement. Reported-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13ext4: Fix truncation of symlinks after failed writeJan Kara
Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the on disk orphan list. Fix this by calling ext4_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext4_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext4_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13jbd2: Fix a race between checkpointing code and journal_get_write_access()Jan Kara
The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd2_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: Use rcu_barrier() on module unload.Jesper Dangaard Brouer
The ext4 module uses rcu_call() thus it should use rcu_barrier()on module unload. The kmem cache ext4_pspace_cachep is sometimes free'ed using call_rcu() callbacks. Thus, we must wait for completion of call_rcu() before doing kmem_cache_destroy(). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13ext4: naturally align struct ext4_allocation_requestEric Sandeen
As Ted noted, the ext4_allocation_request isn't well aligned. Looking at it with pahole we're wasting space on 64-bit arches: struct ext4_allocation_request { struct inode * inode; /* 0 8 */ ext4_lblk_t logical; /* 8 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t goal; /* 16 8 */ ext4_lblk_t lleft; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pleft; /* 32 8 */ ext4_lblk_t lright; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pright; /* 48 8 */ unsigned int len; /* 56 4 */ unsigned int flags; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 9 */ /* sum members: 52, holes: 3, sum holes: 12 */ }; Grouping 32-bit members together closes these holes and shrinks the structure by 12 bytes. which is important since ext4 can get on the hairy edge of stack overruns. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: mark several more functions in mballoc.c as noinlineEric Sandeen
Ted noticed a stack-deep callchain through writepages->ext4_mb_regular_allocator->ext4_mb_init_cache->submit_bh ... With all the static functions in mballoc.c, gcc helpfully inlines for us, and we get something like this: ext4_mb_regular_allocator (232 bytes stack) ext4_mb_init_cache (232 bytes stack) submit_bh (starts 464 deeper) the 2 ext4 functions here get several others inlined; by telling gcc not to inline them, we can save stack space for when we head off into submit_bh land and associated block layer callchains. The following noinlined functions are only called once, so this won't impact any other callchains: ext4_mb_regular_allocator (104) (was 232) ext4_mb_find_by_goal (56) (noinlined) ext4_mb_init_group (24) (noinlined) ext4_mb_init_cache (136) (was 232) ext4_mb_generate_buddy (88) (noinlined) ext4_mb_generate_from_pa (40) (noinlined) submit_bh ext4_mb_simple_scan_group (24) (noinlined) ext4_mb_scan_aligned (56) (noinlined) ext4_mb_complex_scan_group (40) (noinlined) ext4_mb_try_best_found (24) (noinlined) now when we head off into submit_bh() we're only 264 bytes deeper in stack than when we entered ext4_mb_regular_allocator() (vs. 464 bytes before). Every 200 bytes helps. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: Fix potential reclaim deadlock when truncating partial blockTheodore Ts'o
The ext4_block_truncate_page() function previously called grab_cache_page(), which called find_or_create_page() with the __GFP_FS flag potentially set. This could cause a deadlock if the system is low on memory and it attempts a memory reclaim, which could potentially call back into ext4. So we need to call find_or_create_page() directly, and remove the __GFP_FP flag to avoid this potential deadlock. Thanks to Roland Dreier for reporting a lockdep warning which showed this problem. [20786.363249] ================================= [20786.363257] [ INFO: inconsistent lock state ] [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363270] --------------------------------- [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes: [20786.363291] (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363314] {IN-RECLAIM_FS-W} state was registered at: [20786.363320] [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0 [20786.363334] [<ffffffff8108d347>] __lock_acquire+0x287/0x430 [20786.363345] [<ffffffff8108d595>] lock_acquire+0xa5/0x150 [20786.363355] [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150 [20786.363365] [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90 [20786.363377] [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0 [20786.363389] [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0 [20786.363401] [<ffffffff81147095>] generic_drop_inode+0x25/0x30 [20786.363411] [<ffffffff81145ce2>] iput+0x62/0x70 [20786.363420] [<ffffffff81142878>] dentry_iput+0x98/0x110 [20786.363429] [<ffffffff81142a00>] d_kill+0x50/0x80 [20786.363438] [<ffffffff811444c5>] dput+0x95/0x180 [20786.363447] [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70 [20786.363459] [<ffffffff81142978>] d_free+0x28/0x60 [20786.363468] [<ffffffff81142a18>] d_kill+0x68/0x80 [20786.363477] [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0 [20786.363487] [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290 [20786.363497] [<ffffffff81142e89>] prune_dcache+0x109/0x1b0 [20786.363506] [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50 [20786.363516] [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190 [20786.363527] [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640 [20786.363537] [<ffffffff810f9a57>] kswapd+0x117/0x170 [20786.363546] [<ffffffff810773ce>] kthread+0x9e/0xb0 [20786.363558] [<ffffffff8101430a>] child_rip+0xa/0x20 [20786.363569] [<ffffffffffffffff>] 0xffffffffffffffff [20786.363598] irq event stamp: 15997 [20786.363603] hardirqs last enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0 [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0 [20786.363628] softirqs last enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220 [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30 [20786.363651] [20786.363653] other info that might help us debug this: [20786.363660] 3 locks held by http/8397: [20786.363665] #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90 [20786.363685] #1: (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350 [20786.363707] #2: (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363724] [20786.363726] stack backtrace: [20786.363734] Pid: 8397, comm: http Tainted: G C 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363741] Call Trace: [20786.363752] [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0 [20786.363763] [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0 [20786.363773] [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280 [20786.363783] [<ffffffff8108bd97>] mark_lock+0x137/0x1d0 [20786.363793] [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0 [20786.363803] [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0 [20786.363813] [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180 [20786.363824] [<ffffffff810e9411>] ? find_get_page+0x91/0xf0 [20786.363835] [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0 [20786.363845] [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70 [20786.363856] [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0 [20786.363867] [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460 [20786.363876] [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150 [20786.363885] [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150 [20786.363895] [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0 [20786.363905] [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0 [20786.363916] [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290 [20786.363926] [<ffffffff811ccc28>] ext4_truncate+0x498/0x630 [20786.363938] [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0 [20786.363947] [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290 [20786.363957] [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10 [20786.363966] [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0 [20786.363976] [<ffffffff81107690>] vmtruncate+0xb0/0x110 [20786.363986] [<ffffffff81147c05>] inode_setattr+0x35/0x170 [20786.363995] [<ffffffff811c9906>] ext4_setattr+0x186/0x370 [20786.364005] [<ffffffff81147eab>] notify_change+0x16b/0x350 [20786.364014] [<ffffffff8112ed30>] do_truncate+0x70/0x90 [20786.364021] [<ffffffff8112f48b>] T.657+0xeb/0x110 [20786.364021] [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10 [20786.364021] [<ffffffff81013132>] system_call_fastpath+0x16/0x1b Reported-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-20jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical regionTheodore Ts'o
Fix jbd2_dev_to_name(), a function used when pretty-printting jbd2 and ext4 tracepoints. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-03ocfs2: Remove redundant gotos in ocfs2_mount_volume()Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Add statistics for the checksum and ecc operations.Joel Becker
It would be nice to know how often we get checksum failures. Even better, how many of them we can fix with the single bit ecc. So, we add a statistics structure. The structure can be installed into debugfs wherever the user wants. For ocfs2, we'll put it in the superblock-specific debugfs directory and pass it down from our higher-level functions. The stats are only registered with debugfs when the filesystem supports metadata ecc. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2 patch to track delayed orphan scan timer statisticsSrinivas Eeda
Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: timer to queue scan of all orphan slotsSrinivas Eeda
When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directoriesJan Kara
We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin(). So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir() to also use this lock ordering. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix possible deadlock in quota recoveryJan Kara
In ocfs2_finish_quota_recovery() we acquired global quota file lock and started recovering local quota file. During this process we need to get quota structures, which calls ocfs2_dquot_acquire() which gets global quota file lock again. This second lock can block in case some other node has requested the quota file lock in the mean time. Fix the problem by moving quota file locking down into the function where it is really needed. Then dqget() or dqput() won't be called with the lock held. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix possible deadlock with quotas in ocfs2_setattr()Jan Kara
We called vfs_dq_transfer() with global quota file lock held. This can lead to deadlocks as if vfs_dq_transfer() has to allocate new quota structure, it calls ocfs2_dquot_acquire() which tries to get quota file lock again and this can block if another node requested the lock in the mean time. Since we have to call vfs_dq_transfer() with transaction already started and quota file lock ranks above the transaction start, we cannot just rely on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock if they need it. We fix the problem by acquiring pointers to all quota structures needed by vfs_dq_transfer() already before calling the function. By this we are sure that all quota structures are properly allocated and they can be freed only after we drop references to them. Thus we don't need quota file lock anywhere inside vfs_dq_transfer(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix lock inversion in ocfs2_local_read_info()Jan Kara
This function is called with dqio_mutex held but it has to acquire lock from global quota file which ranks above this lock. This is not deadlockable lock inversion since this code path is take only during mount when noone else can race with us but let's clean this up to silence lockdep. We just drop the dqio_mutex in the beginning of the function and reacquire it in the end since we don't need it - noone can race with us at this moment. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()Jan Kara
It is not possible to get a read lock and then try to get the same write lock in one thread as that can block on downconvert being requested by other node leading to deadlock. So first drop the quota lock for reading and only after that get it for writing. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ext4: super.c whitespace cleanupAndreas Dilger
Cleanup of whitespace and formatting. Initially driven by confusing indents for the ext4_{block,inode}_bitmap() et. al. helper routines, but figured I'd cleanup some other 80-column wrapping and other indenting problems at the same time. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-09jbd2: Fix minor typos in comments in fs/jbd2/journal.cAlberto Bertogli
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2009-06-04FAT: add 'errors' mount optionDenis Karpov
On severe errors FAT remounts itself in read-only mode. Allow to specify FAT fs desired behavior through 'errors' mount option: panic, continue or remount read-only. `mount -t [fat|vfat] -o errors=[panic,remount-ro,continue] \ <bdev> <mount point>` This is analog to ext2 fs 'errors' mount option. Signed-off-by: Denis Karpov <ext-denis.2.karpov@nokia.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2009-06-03GFS2: Remove unused variableSteven Whitehouse
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-06-02Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent deadlock in xfs_qm_shake() xfs: fix overflow in xfs_growfs_data_private xfs: fix double unlock in xfs_swap_extents()
2009-06-02cifs: fix IPv6 address length checkJeff Layton
For IPv6 the userspace mount helper sends an address in the "ip=" option. This check fails if the length is > 35 characters. I have no idea where the magic 35 character limit came from, but it's clearly not enough for IPv6. Fix it by making it use the INET6_ADDRSTRLEN #define. While we're at it, use the same #define for the address length in SPNEGO upcalls. Reported-by: Charles R. Anderson <cra@wpi.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-02UBIFS: allow sync option in rootflagsArtem Bityutskiy
When passing UBIFS parameters via kernel command line, the sync option will be passed to UBIFS as a string, not as an MS_SYNCHRONOUS flag. Teach UBIFS interpreting this flag. Reported-by: Aurélien GÉRÔME <ag@debian.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-06-02GFS2: smbd proccess hangs with flock() call.Abhijith Das
GFS2 currently does not support mandatory flocks. An flock() call with LOCK_MAND triggers unexpected behavior because gfs2 is not checking for this lock type. This patch corrects that. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-06-01xfs: prevent deadlock in xfs_qm_shake()Felix Blyakher
It's possible to recurse into filesystem from the memory allocation, which deadlocks in xfs_qm_shake(). Add check for __GFP_FS, and bail out if it is not set. Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-06-01xfs: fix overflow in xfs_growfs_data_privateEric Sandeen
In the case where growing a filesystem would leave the last AG too small, the fixup code has an overflow in the calculation of the new size with one fewer ag, because "nagcount" is a 32 bit number. If the new filesystem has > 2^32 blocks in it this causes a problem resulting in an EINVAL return from growfs: # xfs_io -f -c "truncate 19998630180864" fsfile # mkfs.xfs -f -bsize=4096 -dagsize=76288719b,size=3905982455b fsfile # mount -o loop fsfile /mnt # xfs_growfs /mnt meta-data=/dev/loop0 isize=256 agcount=52, agsize=76288719 blks = sectsz=512 attr=2 data = bsize=4096 blocks=3905982455, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Invalid argument Reported-by: richard.ems@cape-horn-eng.com Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-06-01xfs: fix double unlock in xfs_swap_extents()Felix Blyakher
Regreesion from commit ef8f7fc, which rearranged the code in xfs_swap_extents() leading to double unlock of xfs inode ilock. That resulted in xfs_fsr deadlocking itself on platforms, which don't handle double unlock of rw_semaphore nicely. It caused the count go negative, which represents the write holder, without really having one. ia64 is one of the platforms where deadlock was easily reproduced and the fix was tested. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-06-01NFSv4: kill off complicated macro 'PROC'Yu Zhiguo
J. Bruce Fields wrote: ... > (This is extremely confusing code to track down: note that > proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC() > macro at the end of fs/nfsd/nfs4proc.c. Which means, for example, that > grepping for nfs4svc_decode_compoundargs() gets you nowhere. Patches to > kill off that macro would be welcomed....) the macro 'PROC' is complicated and obscure, it had better be killed off in order to make the code more clear. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-01NFSv4: do exact check about attribute specifiedYu Zhiguo
Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is not supported in current environment. Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check. This bug is found when do newpynfs tests. The names of the tests that failed are following: CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s Add function do_check_fattr() to do exact check: 1, Check attribute specified is supported by the NFSv4 server or not. 2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported in current environment or not. 3, Check attribute specified is writable or not. step 1 and 3 are done in function nfsd4_decode_fattr() but removed to this function now. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-01xfs: prevent deadlock in xfs_qm_shake()Felix Blyakher
It's possible to recurse into filesystem from the memory allocation, which deadlocks in xfs_qm_shake(). Add check for __GFP_FS, and bail out if it is not set. Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-06-01Merge branch 'linus' into perfcounters/coreIngo Molnar
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function
2009-05-30nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints functionRyusuke Konishi
The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the header block of checkpoint file in case of errors. This fixes the leak bug. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-05-29Merge git://git.infradead.org/~dwmw2/mtd-2.6.30Linus Torvalds
* git://git.infradead.org/~dwmw2/mtd-2.6.30: jffs2: Fix corruption when flash erase/write failure mtd: MXC NAND driver fixes (v5)
2009-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: do not oops when driver_unregister() is called for unregistered drivers sysfs: file.c: use create_singlethread_workqueue()
2009-05-29Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: svcrdma: dma unmap the correct length for the RPCRDMA header page. nfsd: Revert "svcrpc: take advantage of tcp autotuning" nfsd: fix hung up of nfs client while sync write data to nfs server
2009-05-29flat: fix data sections alignmentOskar Schirmer
The flat loader uses an architecture's flat_stack_align() to align the stack but assumes word-alignment is enough for the data sections. However, on the Xtensa S6000 we have registers up to 128bit width which can be used from userspace and therefor need userspace stack and data-section alignment of at least this size. This patch drops flat_stack_align() and uses the same alignment that is required for slab caches, ARCH_SLAB_MINALIGN, or wordsize if it's not defined by the architecture. It also fixes m32r which was obviously kaput, aligning an uninitialized stack entry instead of the stack pointer. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oskar Schirmer <os@emlix.com> Cc: David Howells <dhowells@redhat.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <cooloney@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Johannes Weiner <jw@emlix.com> Acked-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-29procfs: make errno values consistent when open pident vs exit(2) race occursKOSAKI Motohiro
proc_pident_instantiate() has following call flow. proc_pident_lookup() proc_pident_instantiate() proc_pid_make_inode() And, proc_pident_lookup() has following error handling. const struct pid_entry *p, *last; error = ERR_PTR(-ENOENT); if (!task) goto out_no_task; Then, proc_pident_instantiate should return ENOENT too when racing against exit(2) occur. EINAL has two bad reason. - it implies caller is wrong. bad the race isn't caller's mistake. - man 2 open don't explain EINVAL. user often don't handle it. Note: Other proc_pid_make_inode() caller already use ENOENT properly. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-29mtd: Handle compat ioctls directly; remove all trace from compat_ioctl.cKevin Cernekee
Remove all references to MTD ioctls from fs/compat_ioctl.c and let them all be handled by mtd_compat_ioctl(). Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-05-29mtd: add OOB ioctls for >4GiB devicesKevin Cernekee
Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>