summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2013-06-14Btrfs: add ioctl to wait for qgroup rescan completionJan Schmidt
btrfs_qgroup_wait_for_completion waits until the currently running qgroup operation completes. It returns immediately when no rescan process is in progress. This is useful to automate things around the rescan process (e.g. testing). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: introduce qgroup_ulist to avoid frequently allocating/freeing ulistWang Shilong
When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time when we want to walk qgroup tree. By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free() once. This reduce some sys time to allocate memory, see the measurements below fsstress -p 4 -n 10000 -d $dir With this patch: real 0m50.153s user 0m0.081s sys 0m6.294s real 0m51.113s user 0m0.092s sys 0m6.220s real 0m52.610s user 0m0.096s sys 0m6.125s avg 6.213 ----------------------------------------------------- Without the patch: real 0m54.825s user 0m0.061s sys 0m10.665s real 1m6.401s user 0m0.089s sys 0m11.218s real 1m13.768s user 0m0.087s sys 0m10.665s avg 10.849 we can see the sys time reduce ~43%. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14btrfs: show compiled-in config features at module load timeDavid Sterba
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. Also kill version.h file that serves no purpose, we don't use any version tag for kernel module. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14btrfs: move ifdef around sanity checks out of init_btrfs_fsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14btrfs: add prefix to sanity tests messagesDavid Sterba
And change the message level to KERN_INFO. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14btrfs: add debug check for extent_io range alignmentDavid Sterba
The 'end' value must exactly cover the end of the interval, which means one byte less than the expected block alignment, or in case of a file smaller than one block, one byte less than the inode size. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: fix check on same raid type flag twiceHenrik Nordvik
Code checked for raid 5 flag in two else-if branches, so code would never be reached. Probably a copy-paste bug. Signed-off-by: Henrik Nordvik <henrikno@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14GFS2: fix regression in dir_double_exhashBob Peterson
Recent commit e8830d8 introduced a bug in function dir_double_exhash; it was failing to set h in the fall-back case. This patch corrects it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-14GFS2: Add atomic_open supportSteven Whitehouse
I've restricted atomic_open to only operate on regular files, although I still don't understand why atomic_open should not be possible also for directories on GFS2. That can always be added in later though, if it makes sense. The ->atomic_open function can be passed negative dentries, which in most cases means either ENOENT (->lookup) or a call to d_instantiate (->create). In the GFS2 case though, we need to actually perform the look up, since we do not know whether there has been a new inode created on another node. The look up calls d_splice_alias which then tries to rehash the dentry - so the solution here is to simply check for that in d_splice_alias. The same issue is likely to affect any other cluster filesystem implementing ->atomic_open Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields fieldses org> Cc: Jeff Layton <jlayton@redhat.com>
2013-06-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is an assortment of crash fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: stop all workers before cleaning up roots Btrfs: fix use-after-free bug during umount Btrfs: init relocate extent_io_tree with a mapping btrfs: Drop inode if inode root is NULL Btrfs: don't delete fs_roots until after we cleanup the transaction
2013-06-14f2fs: recover wrong pino after checkpoint during fsyncJaegeuk Kim
If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: optimize do_write_data_page()Haicheng Li
Since "need_inplace_update() == true" is a very rare case, using unlikely() to give compiler a chance to optimize the code. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: make locate_dirty_segment() as staticHaicheng Li
It's used only locally and could be static. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: remove unnecessary parameter "offset" from __add_sum_entry()Haicheng Li
We can get the value directly from pointer "curseg". Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: avoid freqeunt write_inode callsJaegeuk Kim
If update_inode is called, we don't need to do write_inode. So, let's use a *dirty* flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: optimise the truncate_data_blocks_range() rangeNamjae Jeon
The function truncate_data_blocks_range() decrements the valid block count of inode via dec_valid_block_count(). Since this function updates the i_blocks field of inode, we can update this field once we have calculated total the number of blocks to be freed. Therefore we can decrement valid blocks outside of the for loop. if (nr_free) { + dec_valid_block_count(sbi, dn->inode, nr_free); set_page_dirty(dn->node_page); sync_inode_page(dn); } 'nr_free' tells the total number of blocks freed. So, we can just directly pass this value to dec_valid_block_count() and update the i_blocks. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: use the F2FS specific flags in f2fs_ioctl()Namjae Jeon
In f2fs_ioctl() function, it is using generic flags. Since F2FS specific flags are defined. So lets use those flags. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-13xfs: ensure btree root split sets blkno correctlyDave Chinner
For CRC enabled filesystems, the BMBT is rooted in an inode, so it passes through a different code path on root splits than the freespace and inode btrees. This is much less traversed by xfstests than the other trees. When testing on a 1k block size filesystem, I've been seeing ASSERT failures in generic/234 like: XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317 which are generally preceded by a lblock check failure. I noticed this in the bmbt stats: $ pminfo -f xfs.btree.block_map xfs.btree.block_map.lookup value 39135 xfs.btree.block_map.compare value 268432 xfs.btree.block_map.insrec value 15786 xfs.btree.block_map.delrec value 13884 xfs.btree.block_map.newroot value 2 xfs.btree.block_map.killroot value 0 ..... Very little coverage of root splits and merges. Indeed, on a 4k filesystem, block_map.newroot and block_map.killroot are both zero. i.e. the code is not exercised at all, and it's the only generic btree infrastructure operation that is not exercised by a default run of xfstests. Turns out that on a 1k filesystem, generic/234 accounts for one of those two root splits, and that is somewhat of a smoking gun. In fact, it's the same problem we saw in the directory/attr code where headers are memcpy()d from one block to another without updating the self describing metadata. Simple fix - when copying the header out of the root block, make sure the block number is updated correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-13xfs: fix implicit padding in directory and attr CRC formatsDave Chinner
Michael L. Semon has been testing CRC patches on a 32 bit system and been seeing assert failures in the directory code from xfs/080. Thanks to Michael's heroic efforts with printk debugging, we found that the problem was that the last free space being left in the directory structure was too small to fit a unused tag structure and it was being corrupted and attempting to log a region out of bounds. Hence the assert failure looked something like: ..... #5 calling xfs_dir2_data_log_unused() 36 32 #1 4092 4095 4096 #2 8182 8183 4096 XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 Where #1 showed the first region of the dup being logged (i.e. the last 4 bytes of a directory buffer) and #2 shows the corrupt values being calculated from the length of the dup entry which overflowed the size of the buffer. It turns out that the problem was not in the logging code, nor in the freespace handling code. It is an initial condition bug that only shows up on 32 bit systems. When a new buffer is initialised, where's the freespace that is set up: [ 172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname() [ 172.316346] #9 calling xfs_dir2_data_log_unused() [ 172.316351] #1 calling xfs_trans_log_buf() 60 63 4096 [ 172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096 Note the offset of the first region being logged? It's 60 bytes into the buffer. Once I saw that, I pretty much knew that the bug was going to be caused by this. Essentially, all direct entries are rounded to 8 bytes in length, and all entries start with an 8 byte alignment. This means that we can decode inplace as variables are naturally aligned. With the directory data supposedly starting on a 8 byte boundary, and all entries padded to 8 bytes, the minimum freespace in a directory block is supposed to be 8 bytes, which is large enough to fit a unused data entry structure (6 bytes in size). The fact we only have 4 bytes of free space indicates a directory data block alignment problem. And what do you know - there's an implicit hole in the directory data block header for the CRC format, which means the header is 60 byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs padding. And while looking at the structures, I found the same problem in the attr leaf header. Fix them both. Note that this only affects 32 bit systems with CRCs enabled. Everything else is just fine. Note that CRC enabled filesystems created before this fix on such systems will not be readable with this fix applied. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Debugged-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-12ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extentsJie Liu
Return the FIEMAP_EXTENT_UNKNOWN flag as well except the FIEMAP_EXTENT_DELALLOC because the data location of an delayed allocation extent is unknown. Signed-off-by: Jie Liu <jeff.liu@oracle.com>
2013-06-12jbd2: remove debug dependency on debug_fs and update Kconfig help textPaul Gortmaker
Commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs for jbd_debug") removed any need for a dependency on DEBUG_FS. It also moved the /sys variables out from underneath the typical debugfs mount point. Delete the dependency and update the /sys path to where the debug settings are currently. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12jbd2: use a single printk for jbd_debug()Paul Gortmaker
Since the jbd_debug() is implemented with two separate printk() calls, it can lead to corrupted and misleading debug output like the following (see lines marked with "*"): [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104 [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103 [* 290.339382] JBD2: starting commit of transaction 42104 [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079 i.e. the debug output from log_wait_commit and journal_commit_transaction have become interleaved. The output should have been: (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103 (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104 It is expected that this is not easy to replicate -- I was only able to cause it on preempt-rt kernels, and even then only under heavy I/O load. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Suggested-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12jbd2: fix duplicate debug label for phase 2Paul Gortmaker
Currently we see this output: $git grep phase fs/jbd2 fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n"); [...] There is clearly a duplicate label for phase 2, and they are both active (i.e. not in #if ... #else block). Rename them to be "2a" and "2b" so the debug output is unambiguous. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()Paul Gortmaker
While trying to debug an an issue under extreme I/O loading on preempt-rt kernels, the following backtrace was observed via SysRQ output: rm D ffff8802203afbc0 4600 4878 4748 0x00000000 ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80 ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8 ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000 Call Trace: [<ffffffff8172dc34>] schedule+0x24/0x70 [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520 [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0 [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110 [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230 [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10 [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160 [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230 [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140 [<ffffffff8114370f>] do_rmdir+0xdf/0x120 [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80 [<ffffffff81002889>] ? do_notify_resume+0x89/0x100 [<ffffffff817361ae>] ? int_signal+0x12/0x17 [<ffffffff81145d85>] sys_unlinkat+0x25/0x40 [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b What is interesting here, is that we call log_wait_commit, from within wait_for_space, but we are still holding the checkpoint_mutex as it surrounds mostly the whole of wait_for_space. And then, as we are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED bit is set, then we will also try to take the same checkpoint_mutex. It seems that we need to drop the checkpoint_mutex while sitting in jbd2_log_wait_commit, if we want to guarantee that progress can be made by jbd2_journal_commit_transaction(). There does not seem to be anything preempt-rt specific about this, other then perhaps increasing the odds of it happening. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12jbd2: relocate assert after state lock in journal_commit_transaction()Paul Gortmaker
The state lock is taken after we are doing an assert on the state value, not before. So we might in fact be doing an assert on a transient value. Ensure the state check is within the scope of the state lock being taken. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12ext4: Fix fsync error handling after filesystem abortDmitry Monakhov
If filesystem was aborted after inode's write back is complete but before its metadata was updated we may return success results in data loss. In order to handle fs abort correctly we have to check fs state once we discover that it is in MS_RDONLY state Test case: http://patchwork.ozlabs.org/patch/244297 Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12ext4: fix data integrity for ext4_sync_fsDmitry Monakhov
Inode's data or non journaled quota may be written w/o jounral so we _must_ send a barrier at the end of ext4_sync_fs. But it can be skipped if journal commit will do it for us. Also fix data integrity for nojournal mode. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12jbd2: optimize jbd2_journal_force_commitDmitry Monakhov
Current implementation of jbd2_journal_force_commit() is suboptimal because result in empty and useless commits. But callers just want to force and wait any unfinished commits. We already have jbd2_journal_force_commit_nested() which does exactly what we want, except we are guaranteed that we do not hold journal transaction open. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "Bunch of fixes and one little addition to math64.h" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) include/linux/math64.h: add div64_ul() mm: memcontrol: fix lockless reclaim hierarchy iterator frontswap: fix incorrect zeroing and allocation size for frontswap_map kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() mm: migration: add migrate_entry_wait_huge() ocfs2: add missing lockres put in dlm_mig_lockres_handler mm/page_alloc.c: fix watermark check in __zone_watermark_ok() drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() aio: fix io_destroy() regression by using call_rcu() rtc-at91rm9200: use shadow IMR on at91sam9x5 rtc-at91rm9200: add shadow interrupt mask rtc-at91rm9200: refactor interrupt-register handling rtc-at91rm9200: add configuration support rtc-at91rm9200: add match-table compile guard fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree cciss: fix broken mutex usage in ioctl audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel ...
2013-06-12ocfs2: add missing lockres put in dlm_mig_lockres_handlerXue jiufei
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12aio: fix io_destroy() regression by using call_rcu()Kent Overstreet
There was a regression introduced by 36f5588905c1 ("aio: refcounting cleanup"), reported by Jens Axboe - the refcounting cleanup switched to using RCU in the shutdown path, but the synchronize_rcu() was done in the context of the io_destroy() syscall greatly increasing the time it could block. This patch switches it to call_rcu() and makes shutdown asynchronous (more asynchronous than it was originally; before the refcount changes io_destroy() would still wait on pending kiocbs). Note that there's a global quota on the max outstanding kiocbs, and that quota must be manipulated synchronously; otherwise io_setup() could return -EAGAIN when there isn't quota available, and userspace won't have any way of waiting until shutdown of the old kioctxs has finished (besides busy looping). So we release our quota before kioctx shutdown has finished, which should be fine since the quota never corresponded to anything real anyways. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directoryGoldwyn Rodrigues
While removing a non-empty directory, the kernel dumps a message: (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39 Suppress the error message from being printed in the dmesg so users don't panic. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12ocfs2: ocfs2_prep_new_orphaned_file() should return retXiaowei.Hu
If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir, ocfs2_prep_new_orphaned_file will release the inode_ac, then when the caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to a NULL ocfs2_alloc_context struct in the following functions. A kernel panic happens. Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12kmsg: honor dmesg_restrict sysctl on /dev/kmsgKees Cook
The dmesg_restrict sysctl currently covers the syslog method for access dmesg, however /dev/kmsg isn't covered by the same protections. Most people haven't noticed because util-linux dmesg(1) defaults to using the syslog method for access in older versions. With util-linux dmesg(1) defaults to reading directly from /dev/kmsg. To fix /dev/kmsg, let's compare the existing interfaces and what they allow: - /proc/kmsg allows: - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive single-reader interface (SYSLOG_ACTION_READ). - everything, after an open. - syslog syscall allows: - anything, if CAP_SYSLOG. - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if dmesg_restrict==0. - nothing else (EPERM). The use-cases were: - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs. - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the destructive SYSLOG_ACTION_READs. AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't clear the ring buffer. Based on the comments in devkmsg_llseek, it sounds like actions besides reading aren't going to be supported by /dev/kmsg (i.e. SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive syslog syscall actions. To this end, move the check as Josh had done, but also rename the constants to reflect their new uses (SYSLOG_FROM_CALL becomes SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC). SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC allows destructive actions after a capabilities-constrained SYSLOG_ACTION_OPEN check. - /dev/kmsg allows: - open if CAP_SYSLOG or dmesg_restrict==0 - reading/polling, after open Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192 [akpm@linux-foundation.org: use pr_warn_once()] Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Josh Boyer <jwboyer@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12ext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarilyTheodore Ts'o
Commit 18888cf0883c: "ext4: speed up truncate/unlink by not using bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in the most important codepath for file systems using extents, but a similar optimization also can be done for file systems using indirect blocks, and for the two special cases in the ext4 extents code. Cc: Andrey Sidorov <qrxd43@motorola.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12ext4: add cond_resched() to ext4_free_blocks() & ext4_mb_regular_allocator()Theodore Ts'o
For a file systems with a very large number of block groups, if all of the block group bitmaps are in memory and the file system is relatively badly fragmented, it's possible ext4_mb_regular_allocator() to take a long time trying to find a good match. This is especially true if the tuning parameter mb_max_to_scan has been sent to a very large number. So add a cond_resched() to avoid soft lockup warnings and to provide better system responsiveness. For ext4_free_blocks(), if we are deleting a large range of blocks, and data=journal is enabled so that EXT4_FREE_BLOCKS_FORGET is passed, the loop to call sb_find_get_block() and to call ext4_forget() can take over 10-15 milliseocnds or more. So it's better to add a cond_resched() here a well. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a pair of fixes for double-frees in the recent bundle for 3.10, a couple of fixes for long-standing bugs (sleep while atomic and an endianness fix), and a locking fix that can be triggered when osds are going down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup in rbd_add() rbd: don't destroy ceph_opts in rbd_add() ceph: ceph_pagelist_append might sleep while atomic ceph: add cpu_to_le32() calls when encoding a reconnect capability libceph: must hold mutex for reset_changed_osds()
2013-06-12f2fs: sync dir->i_size with its block allocationJaegeuk Kim
If new dentry block is allocated and its i_size is updated, we should update its inode block together in order to sync i_size and its block allocation. Otherwise, we can loose additional dentry block due to the unconsistent i_size. Errorneous Scenario ------------------- In the recovery routine, - recovery_dentry | - __f2fs_add_link | | - get_new_data_page | | | - i_size_write(new_i_size) | | | - mark_inode_dirty_sync(dir) | | - update_parent_metadata | | | - mark_inode_dirty(dir) | - write_checkpoint - sync_dirty_dir_inodes - filemap_flush(dentry_blocks) - f2fs_write_data_page - skip to write the last dentry block due to index < i_size In the above flow, new_i_size is not updated to its inode block so that the last dentry block will be lost accordingly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-11GFS2: Only do one directory search on createSteven Whitehouse
Creation of a new inode requires a directory search in order to ensure that we are not trying to create an inode with the same name as an existing one. This was hidden away inside the create_ok() function. In the case that there was an existing inode, and a lookup can be substituted for a create (which is the case with regular files when the O_EXCL flag is not in use) then we were doing a second lookup in order to return the inode. This patch merges these two lookups into one. This can be done by passing a flag to gfs2_dir_search() to tell it to just return -EEXIST in the cases where we don't actually want to look up the inode. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-11f2fs: fix i_blocks translation on various types of filesJaegeuk Kim
Basically an inode manages the number of allocated blocks with inode->i_blocks which is represented in a unit of sectors, not file system blocks. But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr translates it to the number of sectors when fstat is called. However, previously f2fs_file_inode_operations only has this, so this patch adds it to all the types of inode_operations. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-11f2fs: set sb->s_fs_info before calling parse_options()Gu Zheng
In f2fs_fill_super(), set sb->s_fs_info before calling parse_options(), then we can get f2fs_sb_info via F2FS_SB(sb) in parse_options(). So that the second argument "sbi" of func parse_options() is no longer needed. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-11f2fs: support xattr security labelsJaegeuk Kim
This patch adds the support of security labels for f2fs, which will be used by Linus Security Models (LSMs). Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules: "Linux Security Modules (LSM) is a framework that allows the Linux kernel to support a variety of computer security models while avoiding favoritism toward any single security implementation. The framework is licensed under the terms of the GNU General Public License and is standard part of the Linux kernel since Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted modules in the official kernel.". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-08hpfs: fix warnings when the filesystem fills upMikulas Patocka
This patch fixes warnings due to missing lock on write error path. WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]() Hardware name: empty Pid: 26563, comm: dd Tainted: P O 3.9.4 #12 Call Trace: hpfs_truncate+0x75/0x80 [hpfs] hpfs_write_begin+0x84/0x90 [hpfs] _hpfs_bmap+0x10/0x10 [hpfs] generic_file_buffered_write+0x121/0x2c0 __generic_file_aio_write+0x1c7/0x3f0 generic_file_aio_write+0x7c/0x100 do_sync_write+0x98/0xd0 hpfs_file_write+0xd/0x50 [hpfs] vfs_write+0xa2/0x160 sys_write+0x51/0xa0 page_fault+0x22/0x30 system_call_fastpath+0x1a/0x1f Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org # 2.6.39+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-08Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Trivial: unused variable removal - Posix-timers: Add the clock ID to the new proc interface to make it useful. The interface is new and should be functional when we reach the final 3.10 release. - Cure a false positive warning in the tick code introduced by the overhaul in 3.10 - Fix for a persistent clock detection regression introduced in this cycle * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Correct run-time detection of persistent_clock. ntp: Remove unused variable flags in __hardpps posix-timers: Show clock ID in proc file tick: Cure broadcast false positive pending bit warning
2013-06-08NFS: Add in v4.2 callback operationBryan Schumaker
NFS v4.2 adds a CB_OFFLOAD operation used by COPY and WRITE_PLUS. Since neither of these operations have been implemented yet, simply return NFS4ERR_NOTSUPP. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS: Make callbacks minor version genericBryan Schumaker
I found a few places that hardcode the minor version number rather than making it dependent on the protocol the callback came in over. This patch makes it easier to add new minor versions in the future. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08Kconfig: Add Kconfig entry for Labeled NFS V4 clientSteve Dickson
This patch adds the NFS_V4_SECURITY_LABEL entry which enables security label support for the NFSv4 client Signed-off-by: Steve Dickson <steved@redhat.com> [trond: Make this non-interactive] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS: Extend NFS xattr handlers to accept the security namespaceDavid Quigley
The existing NFSv4 xattr handlers do not accept xattr calls to the security namespace. This patch extends these handlers to accept xattrs from the security namespace in addition to the default NFSv4 ACL namespace. Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS: Client implementation of Labeled-NFSDavid Quigley
This patch implements the client transport and handling support for labeled NFS. The patch adds two functions to encode and decode the security label recommended attribute which makes use of the LSM hooks added earlier. It also adds code to grab the label from the file attribute structures and encode the label to be sent back to the server. Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS: Add label lifecycle managementDavid Quigley
This patch adds the lifecycle management for the security label structure introduced in an earlier patch. The label is not used yet but allocations and freeing of the structure is handled. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>