summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2013-03-13ext2: Fix BUG_ON in evict() on inode deletionJan Kara
Commit 8e3dffc6 introduced a regression where deleting inode with large extended attributes leads to triggering BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)) in fs/inode.c:evict(). That happens because freeing of xattr block dirtied the inode and it happened after clear_inode() has been called. Fix the issue by moving removal of xattr block into ext2_evict_inode() before clear_inode() call close to a place where data blocks are truncated. That is also more logical place and removes surprising requirement that ext2_free_blocks() mustn't dirty the inode. Reported-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-12fs: Readd the fs module aliases.Eric W. Biederman
I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and ↵Mathieu Desnoyers
security keys Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to compat_process_vm_rw() shows that the compatibility code requires an explicit "access_ok()" check before calling compat_rw_copy_check_uvector(). The same difference seems to appear when we compare fs/read_write.c:do_readv_writev() to fs/compat.c:compat_do_readv_writev(). This subtle difference between the compat and non-compat requirements should probably be debated, as it seems to be error-prone. In fact, there are two others sites that use this function in the Linux kernel, and they both seem to get it wrong: Now shifting our attention to fs/aio.c, we see that aio_setup_iocb() also ends up calling compat_rw_copy_check_uvector() through aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to be missing. Same situation for security/keys/compat.c:compat_keyctl_instantiate_key_iov(). I propose that we add the access_ok() check directly into compat_rw_copy_check_uvector(), so callers don't have to worry about it, and it therefore makes the compat call code similar to its non-compat counterpart. Place the access_ok() check in the same location where copy_from_user() can trigger a -EFAULT error in the non-compat code, so the ABI behaviors are alike on both compat and non-compat. While we are here, fix compat_do_readv_writev() so it checks for compat_rw_copy_check_uvector() negative return values. And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error handling. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12ext4: use s_extent_max_zeroout_kb value as number of kbLukas Czerner
Currently when converting extent to initialized, we have to decide whether to zeroout part/all of the uninitialized extent in order to avoid extent tree growing rapidly. The decision is made by comparing the size of the extent with the configurable value s_extent_max_zeroout_kb which is in kibibytes units. However when converting it to number of blocks we currently use it as it was in bytes. This is obviously bug and it will result in ext4 _never_ zeroout extents, but rather always split and convert parts to initialized while leaving the rest uninitialized in default setting. Fix this by using s_extent_max_zeroout_kb as kibibytes. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-12vfs: fix pipe counter breakageAl Viro
If you open a pipe for neither read nor write, the pipe code will not add any usage counters to the pipe, causing the 'struct pipe_inode_info" to be potentially released early. That doesn't normally matter, since you cannot actually use the pipe, but the pipe release code - particularly fasync handling - still expects the actual pipe infrastructure to all be there. And rather than adding NULL pointer checks, let's just disallow this case, the same way we already do for the named pipe ("fifo") case. This is ancient going back to pre-2.4 days, and until trinity, nobody naver noticed. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-11ext4: use atomic64_t for the per-flexbg free_clusters countTheodore Ts'o
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e4208f7e, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
2013-03-11reiserfs: Use kstrdup instead of kmalloc/strcpyIonut-Gabriel Radu
Signed-off-by: Ionut-Gabriel Radu <ihonius@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11ext3: Fix format string issuesLars-Peter Clausen
ext3_msg() takes the printk prefix as the second parameter and the format string as the third parameter. Two callers of ext3_msg omit the prefix and pass the format string as the second parameter and the first parameter to the format string as the third parameter. In both cases this string comes from an arbitrary source. Which means the string may contain format string characters, which will lead to undefined and potentially harmful behavior. The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages in ext3") and is fixed by this patch. CC: stable@vger.kernel.org Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11quota: add missing use of dq_data_lock in __dquot_initializeJeff Mahoney
The bulk of __dquot_initialize runs under the dqptr_sem which protects the inode->i_dquot pointers. It doesn't protect the dereferenced contents, though. Those are protected by the dq_data_lock, which is missing around the dquot_resv_space call. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11jbd2: fix use after free in jbd2_journal_dirty_metadata()Jan Kara
jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-11fs: Limit sys_mount to only request filesystem modules. (Part 3)Eric W. Biederman
Somehow I failed to add the MODULE_ALIAS_FS for cifs, hostfs, hpfs, squashfs, and udf despite what I thought were my careful checks :( Add them now. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-11hostfs: fix a not needed double checkMarco Stornelli
With the commit 3be2be0a32c18b0fd6d623cda63174a332ca0de1 we removed vmtruncate, but actaully there is no need to call inode_newsize_ok() because the checks are already done in inode_change_ok() at the begin of the function. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-03-10ext4: reserve metadata block for every delayed writeLukas Czerner
Currently we only reserve space (data+metadata) in delayed allocation if we're allocating from new cluster (which is always in non-bigalloc file system) which is ok for data blocks, because we reserve the whole cluster. However we have to reserve metadata for every delayed block we're going to write because every block could potentially require metedata block when we need to grow the extent tree. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-03-10ext4: update reserved space after the 'correction'Lukas Czerner
Currently in ext4_ext_map_blocks() in delayed allocation writeback we would update the reservation and after that check whether we claimed cluster outside of the range of the allocation and if so, we'll give the block back to the reservation pool. However this also means that if the number of reserved data block dropped to zero before the correction, we would release all the metadata reservation as well, however we might still need it because the we're not done with the delayed allocation and there might be more blocks to come. This will result in error messages such as: EXT4-fs warning (device sdb): ext4_da_update_reserve_space:361: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 1 data blocks) This will only happen on bigalloc file system and it can be easily reproduced using fiemap-tester from xfstests like this: ./src/fiemap-tester -m DHDHDHDHD -S -p0 /mnt/test/file Or using xfstests such as 225. Fix this by doing the correction first and updating the reservation after that so that we do not accidentally decrease i_reserved_data_blocks to zero. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: do not use yield()Lukas Czerner
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: remove unused variable in ext4_free_blocks()Lukas Czerner
Remove unused variable 'freed' in ext4_free_blocks(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: fix WARN_ON from ext4_releasepage()Jan Kara
ext4_releasepage() warns when it is passed a page with PageChecked set. However this can correctly happen when invalidate_inode_pages2_range() invalidates pages - and we should fail the release in that case. Since the page was dirty anyway, it won't be discarded and no harm has happened but it's good to be safe. Also remove bogus page_has_buffers() check - we are guaranteed page has buffers in this function. Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Tested-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-10ext4: fix the wrong number of the allocated blocks in ext4_split_extent()Zheng Liu
This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2013-03-10ext4: update extent status tree after an extent is zeroed outZheng Liu
When we try to split an extent, this extent could be zeroed out and mark as initialized. But we don't know this in ext4_map_blocks because it only returns a length of allocated extent. Meanwhile we will mark this extent as uninitialized because we only check m_flags. This commit update extent status tree when we try to split an unwritten extent. We don't need to worry about the status of this extent because we always mark it as initialized. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10ext4: fix wrong m_len value after unwritten extent conversionZheng Liu
The ext4_ext_handle_uninitialized_extents() function was assuming the return value of ext4_ext_map_blocks() is equal to map->m_len. This incorrect assumption was harmless until we started use status tree as a extent cache because we need to update status tree according to 'm_len' value. Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent conversion. It shouldn't cause a bug because we update status tree according to checking EXT4_MAP_UNWRITTEN flag. But it should be fixed. After applied this commit, the following error message from self-testing infrastructure disappears. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10ext4: add self-testing infrastructure to do a sanity checkDmitry Monakhov
This commit adds a self-testing infrastructure like extent tree does to do a sanity check for extent status tree. After status tree is as a extent cache, we'd better to make sure that it caches right result. After applied this commit, we will get a lot of messages when we run xfstests as below. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... kernel: ES cache assertation failed for inode: 230 es_cached ex [974/2/4781/20] != found ex [974/1/4781/1000] ... kernel: ES insert assertation failed for inode: 635 ex_status [0/45/21388/w] != es_status [44/1/21432/u] ... Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: avoid a potential overflow in ext4_es_can_be_merged()Zheng Liu
Check the length of an extent to avoid a potential overflow in ext4_es_can_be_merged(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This is three simple fixes against 3.9-rc1. I have tested each of these fixes and verified they work correctly. The userns oops in key_change_session_keyring and the BUG_ON triggered by proc_ns_follow_link were found by Dave Jones. I am including the enhancement for mount to only trigger requests of filesystem modules here instead of delaying this for the 3.10 merge window because it is both trivial and the kind of change that tends to bit-rot if left untouched for two months." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use nd_jump_link in proc_ns_follow_link fs: Limit sys_mount to only request filesystem modules (Part 2). fs: Limit sys_mount to only request filesystem modules. userns: Stop oopsing in key_change_session_keyring
2013-03-09proc: Use nd_jump_link in proc_ns_follow_linkEric W. Biederman
Update proc_ns_follow_link to use nd_jump_link instead of just manually updating nd.path.dentry. This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave Jones and reproduced trivially with mkdir /proc/self/ns/uts/a. Sigh it looks like the VFS change to require use of nd_jump_link happend while proc_ns_follow_link was baking and since the common case of proc_ns_follow_link continued to work without problems the need for making this change was overlooked. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are scattered fixes and one performance improvement. The biggest functional change is in how we throttle metadata changes. The new code bumps our average file creation rate up by ~13% in fs_mark, and lowers CPU usage. Stefan bisected out a regression in our allocation code that made balance loop on extents larger than 256MB." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: improve the delayed inode throttling Btrfs: fix a mismerge in btrfs_balance() Btrfs: enforce min_bytes parameter during extent allocation Btrfs: allow running defrag in parallel to administrative tasks Btrfs: avoid deadlock on transaction waiting list Btrfs: do not BUG_ON on aborted situation Btrfs: do not BUG_ON in prepare_to_reloc Btrfs: free all recorded tree blocks on error Btrfs: build up error handling for merge_reloc_roots Btrfs: check for NULL pointer in updating reloc roots Btrfs: fix unclosed transaction handler when the async transaction commitment fails Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails Btrfs: use set_nlink if our i_nlink is 0
2013-03-08Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "A small set of cifs fixes which includes one for a recent regression in the write path (pointed out by Anton), some fixes for rename problems and as promised for 3.9 removing the obsolete sockopt mount option (and the accompanying deprecation warning)." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix missing of oplock_read value in smb30_values structure cifs: don't try to unlock pagecache page after releasing it cifs: remove the sockopt= mount option cifs: Check server capability before attempting silly rename cifs: Fix bug when checking error condition in cifs_rename_pending_delete()
2013-03-08vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlinkLinus Torvalds
It's "normal" - it can happen if the file descriptor you followed was opened with O_NOFOLLOW. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-07Merge tag 'ecryptfs-3.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Minor code cleanups and new Kconfig option to disable /dev/ecryptfs The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to userspace communication channel to be compiled out. This may be the first step in it being eventually removed." Hmm. I'm not sure whether these should be called "fixes", and it probably should have gone in the merge window. But I'll let it slide. * tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: allow userspace messaging to be disabled eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid() ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check eCryptfs: remove unneeded checks in virt_to_scatterlist() eCryptfs: Fix -Wmissing-prototypes warnings eCryptfs: Fix -Wunused-but-set-variable warnings eCryptfs: initialize payload_len in keystore.c
2013-03-07Btrfs: improve the delayed inode throttlingChris Mason
The delayed inode code batches up changes to the btree in hopes of doing them in bulk. As the changes build up, processes kick off worker threads and wait for them to make progress. The current code kicks off an async work queue item for each delayed node, which creates a lot of churn. It also uses a fixed 1 HZ waiting period for the throttle, which allows us to build a lot of pending work and can slow down the commit. This changes us to watch a sequence counter as it is bumped during the operations. We kick off fewer work items and have each work item do more work. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-07fs: Limit sys_mount to only request filesystem modules (Part 2).Eric W. Biederman
Add missing MODULE_ALIAS_FS("ocfs2") how did I miss that? Remove unnecessary MODULE_ALIAS_FS("devpts") devpts can not be modular. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-06Btrfs: fix a mismerge in btrfs_balance()Ilya Dryomov
Raid56 merge (merge commit e942f88) had mistakenly removed a call to __cancel_balance(), which resulted in balance not cleaning up after itself after a successful finish. (Cleanup includes switching the state, removing the balance item and releasing mut_ex_op testnset lock.) Bring it back. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-06CIFS: Fix missing of oplock_read value in smb30_values structurePavel Shilovsky
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: don't try to unlock pagecache page after releasing itJeff Layton
We had a recent fix to fix the release of pagecache pages when cifs_writev_requeue writes fail. Unfortunately, it releases the page before trying to unlock it. At that point, the page might be gone by the time the unlock comes in. Unlock the page first before checking the value of "rc", and only then end writeback and release the pages. The page lock isn't required for any of those operations so this should be safe. Reported-by: Anton Altaparmakov <aia21@cam.ac.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: remove the sockopt= mount optionJeff Layton
...as promised for 3.9. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06Merge branch 'master' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9
2013-03-06cifs: Check server capability before attempting silly renameSachin Prabhu
cifs_rename_pending_delete() attempts to silly rename file using CIFSSMBRenameOpenFile(). This uses the SET_FILE_INFORMATION TRANS2 command with information level set to the passthru info-level SMB_SET_FILE_RENAME_INFORMATION. We need to check to make sure that the server support passthru info-levels before attempting the silly rename or else we will fail to rename the file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: Fix bug when checking error condition in cifs_rename_pending_delete()Sachin Prabhu
Fix check for error condition after setting attributes with CIFSSMBSetFileInfo(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-05Btrfs: enforce min_bytes parameter during extent allocationChris Mason
Commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536 changed preallocation of extents to cap the max size we try to allocate. It's a valid change, but the extent reservation code is also used by balance, and that can't tolerate a smaller extent being allocated. __btrfs_prealloc_file_range already has a min_size parameter, which is used by relocation to request a specific extent size. This commit adds an extra check to enforce that minimum extent size. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2013-03-04Btrfs: allow running defrag in parallel to administrative tasksStefan Behrens
Commit 5ac00add added a testnset mutex and code that disallows running administrative tasks in parallel. It is prevented that the device add/delete/balance/replace/resize operations are started in parallel. By mistake, the defragmentation operation was included in the check for mutually exclusiveness as well. This is fixed with this commit. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: avoid deadlock on transaction waiting listLiu Bo
Only let one trans handle to wait for other handles, otherwise we will get ABBA issues. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: do not BUG_ON on aborted situationLiu Bo
Btrfs balance can easily hit BUG_ON in these places, but we want to it bail out gracefully after we force the whole filesystem to readonly. So we use btrfs_std_error hook in place of BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: do not BUG_ON in prepare_to_relocLiu Bo
We can bail out from here gracefully instead of a cold BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: free all recorded tree blocks on errorLiu Bo
We've missed the 'free blocks' part on ENOMEM error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: build up error handling for merge_reloc_rootsLiu Bo
We first use btrfs_std_error hook to replace with BUG_ON, and we also need to cleanup what is left, including reloc roots rbtree and reloc roots list. Here we use a helper function to cleanup both rbtree and list, and since this function can also be used in the balance recover path, we also make the change as well to keep code simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: check for NULL pointer in updating reloc rootsLiu Bo
Add a check for NULL pointer to avoid invalid reference. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: fix unclosed transaction handler when the async transaction ↵Miao Xie
commitment fails If the async transaction commitment failed, we need close the current transaction handler, or the current transaction will be blocked to commit because of this orphan handler. We fix the problem by doing sync transaction commitment, that is to invoke btrfs_commit_transaction(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: fix wrong handle at error path of create_snapshot() when the commit failsMiao Xie
There are several bugs at error path of create_snapshot() when the transaction commitment failed. - access the freed transaction handler. At the end of the transaction commitment, the transaction handler was freed, so we should not access it after the transaction commitment. - we were not aware of the error which happened during the snapshot creation if we submitted a async transaction commitment. - pending snapshot access vs pending snapshot free. when something wrong happened after we submitted a async transaction commitment, the transaction committer would cleanup the pending snapshots and free them. But the snapshot creators were not aware of it, they would access the freed pending snapshots. This patch fixes the above problems by: - remove the dangerous code that accessed the freed handler - assign ->error if the error happens during the snapshot creation - the transaction committer doesn't free the pending snapshots, just assigns the error number and evicts them before we unblock the transaction. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: use set_nlink if our i_nlink is 0Josef Bacik
We need to inc the nlink of deleted entries when running replay so we can do the unlink on the fs_root and get everything cleaned up and then have the orphan cleanup do the right thing. The problem is inc_nlink complains about this, even thought it still does the right thing. So use set_nlink() if our i_nlink is 0 to keep users from seeing the warnings during log replay. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-03eCryptfs: allow userspace messaging to be disabledKees Cook
When the userspace messaging (for the less common case of userspace key wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with it removed. This saves on kernel code size and reduces potential attack surface by removing the /dev/ecryptfs node. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2013-03-04ext4: invalidate extent status tree during extent migrationDmitry Monakhov
mext_replace_branches() will change inode's extents layout so we have to drop corresponding cache. TESTCASE: 301'th xfstest was not yet accepted to official xfstest's branch and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>