From cb1c2e51c5a72f093b5af384b11d2f1c2abd6c13 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Sun, 13 Dec 2009 23:32:06 +0100 Subject: reiserfs: Fix reiserfs lock and journal lock inversion dependency When we were using the bkl, we didn't care about dependencies against other locks, but the mutex conversion created new ones, which is why we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock before acquiring another mutex. But this trick actually fails if we have acquired the reiserfs lock recursively, as we try to unlock it to acquire the new mutex without inverted dependency, but we eventually only decrease its depth. This happens in the case of a nested inode creation/deletion. Say we have no space left on the device, we create an inode and tak the lock but fail to create its entry, then we release the inode using iput(), which calls reiserfs_delete_inode() that takes the reiserfs lock recursively. The path eventually ends up in journal_begin() where we try to take the journal safely but we fail because of the reiserfs lock recursion: [ INFO: possible circular locking dependency detected ] 2.6.32-06486-g053fe57 #2 ------------------------------------------------------- vi/23454 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [] do_journal_begin_r+0x64/0x2f0 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [] validate_chain+0xa23/0xf70 [] __lock_acquire+0x4e5/0xa70 [] lock_acquire+0x7a/0xa0 [] mutex_lock_nested+0x5f/0x2b0 [] reiserfs_write_lock+0x28/0x40 [] do_journal_begin_r+0x6b/0x2f0 [] journal_begin+0x7f/0x120 [] reiserfs_remount+0x212/0x4d0 [] do_remount_sb+0x67/0x140 [] do_mount+0x436/0x6b0 [] sys_mount+0x66/0xa0 [] sysenter_do_call+0x12/0x36 -> #0 (&journal->j_mutex){+.+...}: [] validate_chain+0xf68/0xf70 [] __lock_acquire+0x4e5/0xa70 [] lock_acquire+0x7a/0xa0 [] mutex_lock_nested+0x5f/0x2b0 [] do_journal_begin_r+0x64/0x2f0 [] journal_begin+0x7f/0x120 [] reiserfs_delete_inode+0x9f/0x140 [] generic_delete_inode+0x9c/0x150 [] generic_drop_inode+0x3d/0x60 [] iput+0x47/0x50 [] reiserfs_create+0x16c/0x1c0 [] vfs_create+0xc1/0x130 [] do_filp_open+0x81c/0x920 [] do_sys_open+0x4f/0x110 [] sys_open+0x29/0x40 [] sysenter_do_call+0x12/0x36 other info that might help us debug this: 2 locks held by vi/23454: #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] do_filp_open+0x27e/0x920 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 stack backtrace: Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57 #2 Call Trace: [] ? printk+0x18/0x1e [] print_circular_bug+0xc0/0xd0 [] validate_chain+0xf68/0xf70 [] ? trace_hardirqs_off+0xb/0x10 [] __lock_acquire+0x4e5/0xa70 [] lock_acquire+0x7a/0xa0 [] ? do_journal_begin_r+0x64/0x2f0 [] mutex_lock_nested+0x5f/0x2b0 [] ? do_journal_begin_r+0x64/0x2f0 [] ? do_journal_begin_r+0x64/0x2f0 [] ? delete_one_xattr+0x0/0x1c0 [] do_journal_begin_r+0x64/0x2f0 [] journal_begin+0x7f/0x120 [] ? reiserfs_delete_xattrs+0x15/0x50 [] reiserfs_delete_inode+0x9f/0x140 [] ? generic_delete_inode+0x5f/0x150 [] ? reiserfs_delete_inode+0x0/0x140 [] generic_delete_inode+0x9c/0x150 [] generic_drop_inode+0x3d/0x60 [] iput+0x47/0x50 [] reiserfs_create+0x16c/0x1c0 [] ? inode_permission+0x7d/0xa0 [] vfs_create+0xc1/0x130 [] ? reiserfs_create+0x0/0x1c0 [] do_filp_open+0x81c/0x920 [] ? trace_hardirqs_off+0xb/0x10 [] ? _spin_unlock+0x1d/0x20 [] ? alloc_fd+0xba/0xf0 [] do_sys_open+0x4f/0x110 [] sys_open+0x29/0x40 [] sysenter_do_call+0x12/0x36 To fix this, use reiserfs_lock_once() from reiserfs_delete_inode() which prevents from adding reiserfs lock recursion. Reported-by: Alexander Beregalov Signed-off-by: Frederic Weisbecker Cc: Chris Mason Cc: Ingo Molnar Cc: Thomas Gleixner --- fs/reiserfs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fs/reiserfs/inode.c') diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 3a28e7751b3..bd615dfe4ec 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -31,11 +31,12 @@ void reiserfs_delete_inode(struct inode *inode) JOURNAL_PER_BALANCE_CNT * 2 + 2 * REISERFS_QUOTA_INIT_BLOCKS(inode->i_sb); struct reiserfs_transaction_handle th; + int depth; int err; truncate_inode_pages(&inode->i_data, 0); - reiserfs_write_lock(inode->i_sb); + depth = reiserfs_write_lock_once(inode->i_sb); /* The = 0 happens when we abort creating a new inode for some reason like lack of space.. */ if (!(inode->i_state & I_NEW) && INODE_PKEY(inode)->k_objectid != 0) { /* also handles bad_inode case */ @@ -74,7 +75,7 @@ void reiserfs_delete_inode(struct inode *inode) out: clear_inode(inode); /* note this must go after the journal_end to prevent deadlock */ inode->i_blocks = 0; - reiserfs_write_unlock(inode->i_sb); + reiserfs_write_unlock_once(inode->i_sb, depth); } static void _make_cpu_key(struct cpu_key *key, int version, __u32 dirid, -- cgit v1.2.3-70-g09d2 From 5fe1533fda8ae005541bd418a7a8bc4fa0cda522 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 4 Jan 2010 22:04:01 +0100 Subject: reiserfs: Fix recursive lock on lchown On chown, reiserfs will call reiserfs_setattr() to change the owner of the given inode, but it may also recursively call reiserfs_setattr() to propagate the owner change to the private xattr files for this inode. Hence, the reiserfs lock may be acquired twice which is not wanted as reiserfs_setattr() calls journal_begin() that is going to try to relax the lock in order to safely acquire the journal mutex. Using reiserfs_write_lock_once() from reiserfs_setattr() solves the problem. This fixes the following warning, that precedes a lockdep report. WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50() Hardware name: MS-7418 Unwanted recursive reiserfs lock! Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195 Call Trace: [] ? reiserfs_lock_check_recursive+0x3f/0x50 [] ? reiserfs_lock_check_recursive+0x3f/0x50 [] warn_slowpath_common+0x6c/0xc0 [] ? reiserfs_lock_check_recursive+0x3f/0x50 [] warn_slowpath_fmt+0x2b/0x30 [] reiserfs_lock_check_recursive+0x3f/0x50 [] do_journal_begin_r+0x83/0x350 [] journal_begin+0x7d/0x140 [] ? in_group_p+0x2a/0x30 [] ? inode_change_ok+0x91/0x140 [] reiserfs_setattr+0x15d/0x2e0 [] ? dput+0xe3/0x140 [] ? _raw_spin_unlock+0x2c/0x50 [] chown_one_xattr+0xd/0x10 [] reiserfs_for_each_xattr+0x113/0x2c0 [] ? chown_one_xattr+0x0/0x10 [] ? mutex_lock_nested+0x2a9/0x350 [] reiserfs_chown_xattrs+0x1f/0x60 [] ? in_group_p+0x2a/0x30 [] ? inode_change_ok+0x91/0x140 [] reiserfs_setattr+0x126/0x2e0 [] ? reiserfs_getxattr+0x0/0x90 [] ? cap_inode_need_killpriv+0x37/0x50 [] notify_change+0x151/0x330 [] chown_common+0x6f/0x90 [] sys_lchown+0x6d/0x80 [] sysenter_do_call+0x12/0x32 ---[ end trace 7c2b77224c1442fc ]--- Signed-off-by: Frederic Weisbecker Cc: Christian Kujau Cc: Alexander Beregalov Cc: Chris Mason Cc: Ingo Molnar --- fs/reiserfs/inode.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'fs/reiserfs/inode.c') diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index bd615dfe4ec..47dbfb18877 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3052,13 +3052,14 @@ static ssize_t reiserfs_direct_IO(int rw, struct kiocb *iocb, int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry->d_inode; - int error; unsigned int ia_valid; + int depth; + int error; /* must be turned off for recursive notify_change calls */ ia_valid = attr->ia_valid &= ~(ATTR_KILL_SUID|ATTR_KILL_SGID); - reiserfs_write_lock(inode->i_sb); + depth = reiserfs_write_lock_once(inode->i_sb); if (attr->ia_valid & ATTR_SIZE) { /* version 2 items will be caught by the s_maxbytes check ** done for us in vmtruncate @@ -3149,7 +3150,8 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) } out: - reiserfs_write_unlock(inode->i_sb); + reiserfs_write_unlock_once(inode->i_sb, depth); + return error; } -- cgit v1.2.3-70-g09d2 From 108d3943c021f0b66e860ba98ded40b82b677bd7 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Tue, 5 Jan 2010 00:15:38 +0100 Subject: reiserfs: Relax the lock before truncating pages While truncating a file, reiserfs_setattr() calls inode_setattr() that will truncate the mapping for the given inode, but for that it needs the pages locks. In order to release these, the owners need the reiserfs lock to complete their jobs. But they can't, as we don't release it before calling inode_setattr(). We need to do that to fix the following softlockups: INFO: task flush-8:0:2149 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:0 D f51af998 0 2149 2 0x00000000 f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000 f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40 00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246 Call Trace: [] ? schedule+0x434/0xb20 [] ? resched_task+0x4b/0x70 [] ? mark_held_locks+0x62/0x80 [] ? mutex_lock_nested+0x1fd/0x350 [] mutex_lock_nested+0x169/0x350 [] ? reiserfs_write_lock+0x2e/0x40 [] reiserfs_write_lock+0x2e/0x40 [] do_journal_end+0xc2/0xe70 [] journal_end+0xb2/0x120 [] ? pathrelse+0x33/0xb0 [] reiserfs_end_persistent_transaction+0x64/0x70 [] reiserfs_get_block+0x12ba/0x15f0 [] ? mark_held_locks+0x62/0x80 [] reiserfs_writepage+0xa74/0xe80 [] ? _raw_spin_unlock_irq+0x27/0x50 [] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0 [] ? find_get_pages_tag+0x127/0x1a0 [] ? mark_held_locks+0x62/0x80 [] ? trace_hardirqs_on_caller+0x124/0x170 [] __writepage+0x10/0x40 [] write_cache_pages+0x16b/0x320 [] ? __writepage+0x0/0x40 [] generic_writepages+0x28/0x40 [] do_writepages+0x35/0x40 [] writeback_single_inode+0xc7/0x330 [] writeback_inodes_wb+0x2c2/0x490 [] wb_writeback+0x106/0x1b0 [] wb_do_writeback+0x106/0x1e0 [] ? wb_do_writeback+0x28/0x1e0 [] bdi_writeback_task+0x3a/0xb0 [] bdi_start_fn+0x63/0xc0 [] ? bdi_start_fn+0x0/0xc0 [] kthread+0x74/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x6/0x10 3 locks held by flush-8:0/2149: #0: (&type->s_umount_key#30){+++++.}, at: [] writeback_inodes_wb+0x27f/0x490 #1: (&journal->j_mutex){+.+...}, at: [] do_journal_end+0xba/0xe70 #2: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x2e/0x40 INFO: task fstest:3813 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fstest D 00000002 0 3813 3812 0x00000000 f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000 00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40 00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c Call Trace: [] ? free_hot_cold_page+0x1d5/0x280 [] io_schedule+0x74/0xc0 [] sync_page+0x35/0x60 [] __wait_on_bit_lock+0x4a/0x90 [] ? sync_page+0x0/0x60 [] __lock_page+0x85/0x90 [] ? wake_bit_function+0x0/0x60 [] truncate_inode_pages_range+0x1e4/0x2d0 [] truncate_inode_pages+0x1f/0x30 [] truncate_pagecache+0x5f/0xa0 [] vmtruncate+0x5a/0x70 [] inode_setattr+0x5d/0x190 [] reiserfs_setattr+0x1f7/0x2f0 [] ? down_write+0x49/0x70 [] notify_change+0x151/0x330 [] do_truncate+0x6d/0xa0 [] do_filp_open+0x9a2/0xcf0 [] ? _raw_spin_unlock+0x2c/0x50 [] ? alloc_fd+0xe0/0x100 [] do_sys_open+0x6d/0x130 [] ? sysenter_exit+0xf/0x16 [] sys_open+0x2e/0x40 [] sysenter_do_call+0x12/0x32 3 locks held by fstest/3813: #0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [] do_truncate+0x63/0xa0 #1: (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [] notify_change+0x257/0x330 #2: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock_once+0x2e/0x50 Signed-off-by: Frederic Weisbecker Cc: Christian Kujau Cc: Alexander Beregalov Cc: Chris Mason Cc: Ingo Molnar --- fs/reiserfs/inode.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'fs/reiserfs/inode.c') diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 47dbfb18877..c876341ea73 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3140,8 +3140,17 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) journal_end(&th, inode->i_sb, jbegin_count); } } - if (!error) + if (!error) { + /* + * Relax the lock here, as it might truncate the + * inode pages and wait for inode pages locks. + * To release such page lock, the owner needs the + * reiserfs lock + */ + reiserfs_write_unlock_once(inode->i_sb, depth); error = inode_setattr(inode, attr); + depth = reiserfs_write_lock_once(inode->i_sb); + } } if (!error && reiserfs_posixacl(inode->i_sb)) { -- cgit v1.2.3-70-g09d2