summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-07-24CAPABILITIES: remove undefined caps from all processesEric Paris
This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744 plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-07-24Merge tag 'keys-next-20140722' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next
2014-07-22Merge branch 'keys-fixes' into keys-nextDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com>
2014-07-22KEYS: user: Use key preparsingDavid Howells
Make use of key preparsing in user-defined and logon keys so that quota size determination can take place prior to keyring locking when a key is being added. Also the idmapper key types need to change to match as they use the user-defined key type routines. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>
2014-07-18seccomp: implement SECCOMP_FILTER_FLAG_TSYNCKees Cook
Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18sched: move no_new_privs into new atomic flagsKees Cook
Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-17KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMINDavid Howells
Special kernel keys, such as those used to hold DNS results for AFS, CIFS and NFS and those used to hold idmapper results for NFS, used to be 'invalidateable' with key_revoke(). However, since the default permissions for keys were reduced: Commit: 96b5c8fea6c0861621051290d705ec2e971963f1 KEYS: Reduce initial permissions on keys it has become impossible to do this. Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be invalidated by root. This should not be used for system keyrings as the garbage collector will try and remove any invalidate key. For system keyrings, KEY_FLAG_ROOT_CAN_CLEAR can be used instead. After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and idmapper keys. Invalidated keys are immediately garbage collected and will be immediately rerequested if needed again. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve Dickson <steved@redhat.com>
2014-07-13Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "More bug fixes for ext4 -- most importantly, a fix for a bug introduced in 3.15 that can end up triggering a file system corruption error after a journal replay. It shouldn't lead to any actual data corruption, but it is scary and can force file systems to be remounted read-only, etc" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix potential null pointer dereference in ext4_free_inode ext4: fix a potential deadlock in __ext4_es_shrink() ext4: revert commit which was causing fs corruption after journal replays ext4: disable synchronous transaction batching if max_batch_time==0 ext4: clarify ext4_error message in ext4_mb_generate_buddy_error() ext4: clarify error count warning messages ext4: fix unjournalled bg descriptor while initializing inode bitmap
2014-07-12ext4: fix potential null pointer dereference in ext4_free_inodeNamjae Jeon
Fix potential null pointer dereferencing problem caused by e43bb4e612 ("ext4: decrement free clusters/inodes counters when block group declared bad") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2014-07-12ext4: fix a potential deadlock in __ext4_es_shrink()Theodore Ts'o
This fixes the following lockdep complaint: [ INFO: possible circular locking dependency detected ] 3.16.0-rc2-mm1+ #7 Tainted: G O ------------------------------------------------------- kworker/u24:0/4356 is trying to acquire lock: (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0 but task is already holding lock: (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180 which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_es_lock); lock(&(&sbi->s_es_lru_lock)->rlock); lock(&ei->i_es_lock); lock(&(&sbi->s_es_lru_lock)->rlock); *** DEADLOCK *** 6 locks held by kworker/u24:0/4356: #0: ("writeback"){.+.+.+}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560 #1: ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560 #2: (&type->s_umount_key#22){++++++}, at: [<ffffffff811a9c74>] grab_super_passive+0x44/0x90 #3: (jbd2_handle){+.+...}, at: [<ffffffff812979f9>] start_this_handle+0x189/0x5f0 #4: (&ei->i_data_sem){++++..}, at: [<ffffffff81247062>] ext4_map_blocks+0x132/0x550 #5: (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180 stack backtrace: CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G O 3.16.0-rc2-mm1+ #7 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: writeback bdi_writeback_workfn (flush-253:0) ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007 ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568 ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930 Call Trace: [<ffffffff815df0bb>] dump_stack+0x4e/0x68 [<ffffffff815db3dd>] print_circular_bug+0x1fb/0x20c [<ffffffff810a7a3e>] __lock_acquire+0x163e/0x1d00 [<ffffffff815e89dc>] ? retint_restore_args+0xe/0xe [<ffffffff815ddc7b>] ? __slab_alloc+0x4a8/0x4ce [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff810a8707>] lock_acquire+0x87/0x120 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff8128592d>] ? ext4_es_free_extent+0x5d/0x70 [<ffffffff815e6f09>] _raw_spin_lock+0x39/0x50 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff8119760b>] ? kmem_cache_alloc+0x18b/0x1a0 [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0 [<ffffffff812869b8>] ext4_es_insert_extent+0xc8/0x180 [<ffffffff812470f4>] ext4_map_blocks+0x1c4/0x550 [<ffffffff8124c4c4>] ext4_writepages+0x6d4/0xd00 ... Reported-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Minchan Kim <minchan@kernel.org> Cc: stable@vger.kernel.org Cc: Zheng Liu <gnehzuil.liu@gmail.com>
2014-07-11Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfix from Bruce Fields: "Another xdr encoding regression that may cause incorrect encoding on failures of certain readdirs" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: nfsd: Fix bad reserving space for encoding rdattr_error
2014-07-11ext4: revert commit which was causing fs corruption after journal replaysTheodore Ts'o
Commit 007649375f6af2 ("ext4: initialize multi-block allocator before checking block descriptors") causes the block group descriptor's count of the number of free blocks to become inconsistent with the number of free blocks in the allocation bitmap. This is a harmless form of fs corruption, but it causes the kernel to potentially remount the file system read-only, or to panic, depending on the file systems's error behavior. Thanks to Eric Whitney for his tireless work to reproduce and to find the guilty commit. Fixes: 007649375f6af2 ("ext4: initialize multi-block allocator before checking block descriptors" Cc: stable@vger.kernel.org # 3.15 Reported-by: David Jander <david@protonic.nl> Reported-by: Matteo Croce <technoboy85@gmail.com> Tested-by: Eric Whitney <enwlinux@gmail.com> Suggested-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-07-10Merge branch 'for-3.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Mostly fixes for the fallouts from the recent cgroup core changes. The decoupled nature of cgroup dynamic hierarchy management (hierarchies are created dynamically on mount but may or may not be reused once unmounted depending on remaining usages) led to more ugliness being added to kernfs. Hopefully, this is the last of it" * 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: break kernfs active protection in cpuset_write_resmask() cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() kernfs: introduce kernfs_pin_sb() cgroup: fix mount failure in a corner case cpuset,mempolicy: fix sleeping function called from invalid context cgroup: fix broken css_has_online_children()
2014-07-09Merge tag 'f2fs-fixes-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bugfixes from Jaegeuk Kim: "This includes a couple of bug fixes found by xfstests. In addition, one critical bug was reported by Brian Chadwick, which is falling into the infinite loop in balance_dirty_pages. And it turned out due to the IO merging policy in f2fs, which was newly merged in 3.16. - fix normal and recovery path for fallocated regions - fix error case mishandling - recover renamed fsync inodes correctly - fix to get out of infinite loops in balance_dirty_pages - fix kernel NULL pointer error" * tag 'f2fs-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid to access NULL pointer in issue_flush_thread f2fs: check bdi->dirty_exceeded when trying to skip data writes f2fs: do checkpoint for the renamed inode f2fs: release new entry page correctly in error path of f2fs_rename f2fs: fix error path in init_inode_metadata f2fs: check lower bound nid value in check_nid_range f2fs: remove unused variables in f2fs_sm_info f2fs: fix not to allocate unnecessary blocks during fallocate f2fs: recover fallocated data and its i_size together f2fs: fix to report newly allocate region as extent
2014-07-09f2fs: avoid to access NULL pointer in issue_flush_threadChao Yu
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861 Denis 2014-05-10 11:28:59 UTC reported: "F2FS-fs (mmcblk0p28): mounting.. Unable to handle kernel NULL pointer dereference at virtual address 00000018 ... [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c) [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4) [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)" This patch assign cmd_control_info in sm_info before issue_flush_thread is being created, so this make sure that issue flush thread will have no chance to access invalid info in fcc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: check bdi->dirty_exceeded when trying to skip data writesJaegeuk Kim
If we don't check the current backing device status, balance_dirty_pages can fall into infinite pausing routine. This can be occurred when a lot of directories make a small number of dirty dentry pages including files. Reported-by: Brian Chadwick <brianchad@westnet.com.au> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: do checkpoint for the renamed inodeJaegeuk Kim
If an inode is renamed, it should be registered as file_lost_pino to conduct checkpoint at f2fs_sync_file. Otherwise, the inode cannot be recovered due to no dent_mark in the following scenario. Note that, this scenario is from xfstests/322. 1. create "a" 2. fsync "a" 3. rename "a" to "b" 4. fsync "b" 5. Sudden power-cut After recovery is done, "b" should be seen. However, the result shows "a", since the recovery procedure does not enter recover_dentry due to no dent_mark. The reason is like below. - The nid of "a" is checkpointed during #2, f2fs_sync_file. - The inode page for "b" produced by #3 is written without dent_mark by sync_node_pages. So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode. If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers the latest pino and its dentry information for further recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: release new entry page correctly in error path of f2fs_renameChao Yu
This patch correct releasing code of new_page to avoid BUG_ON in error patch of f2fs_rename. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: fix error path in init_inode_metadataChao Yu
If we fail in this path: ->init_inode_metadata ->make_empty_dir ->get_new_data_page ->grab_cache_page return -ENOMEM We will bug on in error path of init_inode_metadata when call remove_inode_page because i_block = 2 (one inode block will be released later & one dentry block). We should release the dentry block in init_inode_metadata to avoid this BUG_ON, and avoid leak of dentry block resource, because we never have second chance to release that block in ->evict_inode as in upper error path we make this inode 'bad'. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: check lower bound nid value in check_nid_rangeChao Yu
This patch add lower bound verification for nid in check_nid_range, so nids reserved like 0, node, meta passed by caller could be checked there. And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09f2fs: remove unused variables in f2fs_sm_infoChao Yu
Remove unused variables in struct f2fs_sm_info. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-07nfsd: Fix bad reserving space for encoding rdattr_errorKinglong Mee
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-05ext4: disable synchronous transaction batching if max_batch_time==0Eric Sandeen
The mount manpage says of the max_batch_time option, This optimization can be turned off entirely by setting max_batch_time to 0. But the code doesn't do that. So fix the code to do that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-05ext4: clarify ext4_error message in ext4_mb_generate_buddy_error()Theodore Ts'o
We are spending a lot of time explaining to users what this error means. Let's try to improve the message to avoid this problem. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-05ext4: clarify error count warning messagesTheodore Ts'o
Make it clear that values printed are times, and that it is error since last fsck. Also add note about fsck version required. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2014-07-05ext4: fix unjournalled bg descriptor while initializing inode bitmapTheodore Ts'o
The first time that we allocate from an uninitialized inode allocation bitmap, if the block allocation bitmap is also uninitalized, we need to get write access to the block group descriptor before we start modifying the block group descriptor flags and updating the free block count, etc. Otherwise, there is the potential of a bad journal checksum (if journal checksums are enabled), and of the file system becoming inconsistent if we crash at exactly the wrong time. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've queued up a few fixes in my for-linus branch" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix crash when starting transaction Btrfs: fix btrfs_print_leaf for skinny metadata Btrfs: fix race of using total_bytes_pinned btrfs: use E2BIG instead of EIO if compression does not help btrfs: remove stale comment from btrfs_flush_all_pending_stuffs Btrfs: fix use-after-free when cloning a trailing file hole btrfs: fix null pointer dereference in btrfs_show_devname when name is null btrfs: fix null pointer dereference in clone_fs_devices when name is null btrfs: fix nossd and ssd_spread mount option regression Btrfs: fix race between balance recovery and root deletion Btrfs: atomically set inode->i_flags in btrfs_update_iflags btrfs: only unlock block in verify_parent_transid if we locked it Btrfs: assert send doesn't attempt to start transactions btrfs compression: reuse recently used workspace Btrfs: fix crash when mounting raid5 btrfs with missing disks btrfs: create sprout should rename fsid on the sysfs as well btrfs: dev replace should replace the sysfs entry btrfs: dev add should add its sysfs entry btrfs: dev delete should remove sysfs entry btrfs: rename add_device_membership to btrfs_kobj_add_device
2014-07-03Merge tag 'driver-core-3.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Well, one drivercore fix for kernfs to resolve a reported issue with sysfs files being updated from atomic contexts, and another lz4 bugfix for testing potential buffer overflows" * tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: lz4: add overrun checks to lz4_uncompress_unknownoutputsize() kernfs: kernfs_notify() must be useable from non-sleepable contexts
2014-07-03Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "By coincidence, two NFSv4 symlink bugs, one introduced in the 3.16 xdr encoding rewrite, the other a decoding bug that I think we've had since the start but that just doesn't trigger very often" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: nfs: fix nfs4d readlink truncated packet nfsd: fix rare symlink decoding bug
2014-07-03fs/seq_file: fallback to vmalloc allocationHeiko Carstens
There are a couple of seq_files which use the single_open() interface. This interface requires that the whole output must fit into a single buffer. E.g. for /proc/stat allocation failures have been observed because an order-4 memory allocation failed due to memory fragmentation. In such situations reading /proc/stat is not possible anymore. Therefore change the seq_file code to fallback to vmalloc allocations which will usually result in a couple of order-0 allocations and hence also work if memory is fragmented. For reference a call trace where reading from /proc/stat failed: sadc: page allocation failure: order:4, mode:0x1040d0 CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1 [...] Call Trace: show_stack+0x6c/0xe8 warn_alloc_failed+0xd6/0x138 __alloc_pages_nodemask+0x9da/0xb68 __get_free_pages+0x2e/0x58 kmalloc_order_trace+0x44/0xc0 stat_open+0x5a/0xd8 proc_reg_open+0x8a/0x140 do_dentry_open+0x1bc/0x2c8 finish_open+0x46/0x60 do_last+0x382/0x10d0 path_openat+0xc8/0x4f8 do_filp_open+0x46/0xa8 do_sys_open+0x114/0x1f0 sysc_tracego+0x14/0x1a Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03/proc/stat: convert to single_open_size()Heiko Carstens
These two patches are supposed to "fix" failed order-4 memory allocations which have been observed when reading /proc/stat. The problem has been observed on s390 as well as on x86. To address the problem change the seq_file memory allocations to fallback to use vmalloc, so that allocations also work if memory is fragmented. This approach seems to be simpler and less intrusive than changing /proc/stat to use an interator. Also it "fixes" other users as well, which use seq_file's single_open() interface. This patch (of 2): Use seq_file's single_open_size() to preallocate a buffer that is large enough to hold the whole output, instead of open coding it. Also calculate the requested size using the number of online cpus instead of possible cpus, since the size of the output only depends on the number of online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03autofs4: fix false positive compile errorIan Kent
On strict build environments we can see: fs/autofs4/inode.c: In function 'autofs4_fill_super': fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function make[2]: *** [fs/autofs4/inode.o] Error 1 make[1]: *** [fs/autofs4] Error 2 make: *** [fs] Error 2 make: *** Waiting for unfinished jobs.... This is due to the use of pgrp_set being used to indicate pgrp has has been set rather than initializing pgrp itself. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03Btrfs: fix crash when starting transactionFilipe Manana
Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] ------------[ cut here ]------------ [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [51502.244010] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0 [51502.244010] [<ffffffff811baebc>] vfs_unlink+0xcc/0x150 [51502.244010] [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0 [51502.244010] [<ffffffff811a9ef4>] ? filp_close+0x64/0x90 [51502.244010] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40 [51502.244010] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] ------------[ cut here ]------------ [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [25405.100008] Call Trace: [25405.100008] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.100008] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.100008] [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.100008] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.100008] [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0 [25405.100008] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [25405.100008] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.100008] [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs] [25405.100008] [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.100008] [<ffffffff811bc63b>] vfs_create+0x9b/0x130 [25405.100008] [<ffffffff811bcf19>] do_last+0x849/0xe20 [25405.100008] [<ffffffff811b9409>] ? link_path_walk+0x79/0x820 [25405.100008] [<ffffffff811bd5b5>] path_openat+0xc5/0x690 [25405.100008] [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10 [25405.100008] [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0 [25405.100008] [<ffffffff811be2a3>] do_filp_open+0x43/0xa0 [25405.100008] [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0 [25405.100008] [<ffffffff811abcfc>] do_sys_open+0x13c/0x230 [25405.100008] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.100008] [<ffffffff811abe12>] SyS_open+0x22/0x30 [25405.100008] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.100008] RIP [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix btrfs_print_leaf for skinny metadataJosef Bacik
We wouldn't actuall print the extent information if we had a skinny metadata item, this fixes that. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race of using total_bytes_pinnedLiu Bo
This percpu counter @total_bytes_pinned is introduced to skip unnecessary operations of 'commit transaction', it accounts for those space we may free but are stuck in delayed refs. And we zero out @space_info->total_bytes_pinned every transaction period so we have a better idea of how much space we'll actually free up by committing this transaction. However, we do the 'zero out' part a little earlier, before we actually unpin space, so we end up returning ENOSPC when we actually have free space that's just unpinned from committing transaction. xfstests/generic/074 complained then. This fixes it by actually accounting the percpu pinned number when 'unpin', and since it's protected by space_info->lock, the race is gone now. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: use E2BIG instead of EIO if compression does not helpDavid Sterba
Return codes got updated in 60e1975acb48fc3d74a3422b21dde74c977ac3d5 (btrfs: return errno instead of -1 from compression) lzo wrapper returns E2BIG in this case, do the same for zlib. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-07-03btrfs: remove stale comment from btrfs_flush_all_pending_stuffsDavid Sterba
Commit fcebe4562dec83b3f8d3088d77584727b09130b2 (Btrfs: rework qgroup accounting) removed the qgroup accounting after delayed refs. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-07-03Btrfs: fix use-after-free when cloning a trailing file holeFilipe Manana
The transaction handle was being used after being freed. Cc: Chris Mason <clm@fb.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in btrfs_show_devname when name is nullAnand Jain
dev->name is null but missing flag is not set. Strictly speaking the missing flag should have been set, but there are more places where code just checks if name is null. For now this patch does the same. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000064 IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs] [<ffffffff81198879>] show_vfsmnt+0x39/0x130 [<ffffffff81178056>] m_show+0x16/0x20 [<ffffffff8117d706>] seq_read+0x296/0x390 [<ffffffff8115aa7d>] vfs_read+0x9d/0x160 [<ffffffff8115b549>] SyS_read+0x49/0x90 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in clone_fs_devices when name is nullAnand Jain
when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812e18c0>] strlen+0x0/0x30 [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs] [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs] [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0 [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs] [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0 [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180 [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590 [<ffffffff81193426>] ? blkdev_put+0x106/0x110 [<ffffffff81179175>] ? mntput+0x35/0x40 [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0 [<ffffffff8115c72e>] ? ____fput+0xe/0x10 [<ffffffff81068033>] ? task_work_run+0xb3/0xd0 [<ffffffff8116d547>] SyS_ioctl+0x57/0x90 [<ffffffff817a793e>] ? do_page_fault+0xe/0x10 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix nossd and ssd_spread mount option regressionEric Sandeen
The commit 0780253 btrfs: Cleanup the btrfs_parse_options for remount. broke ssd options quite badly; it stopped making ssd_spread imply ssd, and it made "nossd" unsettable. Put things back at least as well as they were before (though ssd mount option handling is still pretty odd: # mount -o "nossd,ssd_spread" works?) Reported-by: Roman Mamedov <rm@romanrm.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race between balance recovery and root deletionWang Shilong
Balance recovery is called when RW mounting or remounting from RO to RW, it is called to finish roots merging. When doing balance recovery, relocation root's corresponding fs root(whose root refs is 0) might be destroyed by cleaner thread, this will make btrfs fail to mount. Fix this problem by holding @cleaner_mutex when doing balance recovery. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: atomically set inode->i_flags in btrfs_update_iflagsFilipe Manana
This change is based on the corresponding recent change for ext4: ext4: atomically set inode->i_flags in ext4_set_inode_flags() That has the following commit message that applies to btrfs as well: "Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time." Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE and BTRFS_INODE_APPEND, respectively. Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-02nfs: fix nfs4d readlink truncated packetAvi Kivity
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding, but truncates the packet to the padding-less size. Fix by taking the padding into consideration when truncating the packet. Symptoms: # ll /mnt/ ls: cannot read symbolic link /mnt/test: Input/output error total 4 -rw-r--r--. 1 root root 0 Jun 14 01:21 123456 lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation) Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-02kernfs: kernfs_notify() must be useable from non-sleepable contextsTejun Heo
d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too") added fsnotify triggering to kernfs_notify() which requires a sleepable context. There are already existing users of kernfs_notify() which invoke it from an atomic context and in general it's silly to require a sleepable context for triggering a notification. The following is an invalid context bug triggerd by md invoking sysfs_notify() from IO completion path. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 2 locks held by swapper/1/0: #0: (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk] #1: (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240 irq event stamp: 33518 hardirqs last enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72 softirqs last enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c 0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000 0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2 Call Trace: <IRQ> [<ffffffff81807b4c>] dump_stack+0x4d/0x66 [<ffffffff810d4f14>] __might_sleep+0x184/0x240 [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440 [<ffffffff812d76a0>] kernfs_notify+0x90/0x150 [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240 [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1] [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1] [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1] [<ffffffff813acb8b>] bio_endio+0x6b/0xa0 [<ffffffff813b46c4>] blk_update_request+0x94/0x420 [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70 [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk] [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120 [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30 [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk] [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring] [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340 [<ffffffff8111647d>] handle_irq_event+0x3d/0x60 [<ffffffff81119436>] handle_edge_irq+0x66/0x130 [<ffffffff8101c3e4>] handle_irq+0x84/0x150 [<ffffffff818146ad>] do_IRQ+0x4d/0xe0 [<ffffffff818122f2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10 [<ffffffff81025454>] default_idle+0x24/0x230 [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20 [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0 [<ffffffff8104df1b>] start_secondary+0x25b/0x300 This patch fixes it by punting the notification delivery through a work item. This ends up adding an extra pointer to kernfs_elem_attr enlarging kernfs_node by a pointer, which is not ideal but not a very big deal either. If this turns out to be an actual issue, we can move kernfs_elem_attr->size to kernfs_node->iattr later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30kernfs: introduce kernfs_pin_sb()Li Zefan
kernfs_pin_sb() tries to get a refcnt of the superblock. This will be used by cgroupfs. v2: - make kernfs_pin_sb() return the superblock. - drop kernfs_drop_sb(). tj: Updated the comment a bit. [ This is a prerequisite for a bugfix. ] Cc: <stable@vger.kernel.org> # 3.15 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Fix a regression when trying to compile ext4 on older versions gcc. Fix a number of miscellaneous bugs for punch hole as well as a long-standing potential double buffer head release when failing a block allocation for an indirect-mapped file" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix hole punching for files with indirect blocks ext4: Fix block zeroing when punching holes in indirect block files ext4: decrement free clusters/inodes counters when block group declared bad fs/mbcache: replace __builtin_log2() with ilog2() ext4: Fix buffer double free in ext4_alloc_branch()
2014-06-28btrfs: only unlock block in verify_parent_transid if we locked itJosef Bacik
This is a regression from my patch a26e8c9f75b0bfd8cccc9e8f110737b136eb5994, we need to only unlock the block if we were the one who locked it. Otherwise this will trip BUG_ON()'s in locking.c Thanks, cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28Btrfs: assert send doesn't attempt to start transactionsFilipe Manana
When starting a transaction just assert that current->journal_info doesn't contain a send transaction stub, since send isn't supposed to start transactions and when it finishes (either successfully or not) it's supposed to set current->journal_info to NULL. This is motivated by the change titled: Btrfs: fix crash when starting transaction Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs compression: reuse recently used workspaceSergey Senozhatsky
Add compression `workspace' in free_workspace() to `idle_workspace' list head, instead of tail. So we have better chances to reuse most recently used `workspace'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>