summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2013-11-13Merge tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull ubifs changes from Artem Bityutskiy: "Mostly fixes for the power cut emulation UBIFS mode, and only one functional change which fixes a return error code" * tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: correct data corruption range UBIFS: fix return code UBIFS: remove unnecessary code in ubifs_garbage_collect
2013-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This adds a ->writepage() implementation to fuse, improving mmaped writeout and paving the way for buffered writeback. And there's a patch to add a fix minor number for /dev/cuse, similarly to /dev/fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: writepages: protect secondary requests from fuse file release fuse: writepages: update bdi writeout when deleting secondary request fuse: writepages: crop secondary requests fuse: writepages: roll back changes if request not found cuse: add fix minor number to /dev/cuse fuse: writepage: skip already in flight fuse: writepages: handle same page rewrites fuse: writepages: fix aggregation fuse: fix race in fuse_writepages() fuse: Implement writepages callback fuse: don't BUG on no write file fuse: lock page in mkwrite fuse: Prepare to handle multiple pages in writeback fuse: Getting file for writeback helper
2013-11-13Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext[23], udf and quota fixes from Jan Kara: "Assorted fixes in quota, ext2, ext3 & udf. Probably the most important is a fix of fs corruption issue in ext2 XIP support (OTOH xip is rarely used)" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix fs corruption in ext2_get_xip_mem() quota: info leak in quota_getquota() jbd: Revert "jbd: remove dependency on __GFP_NOFAIL" udf: fix for pathetic mount times in case of invalid file system ext3: Count journal as bsddf overhead in ext3_statfs
2013-11-13Merge tag 'for-f2fs-3.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - add a sysfs to control reclaiming free segments - enhance the f2fs global lock procedures - enhance the victim selection flow - wait for selected node blocks during fsync - add some tracepoints - add a config to remove abundant BUG_ONs The other bug fixes are as follows. - fix deadlock on acl operations - fix some bugs with respect to orphan inodes And, there are a bunch of cleanups" * tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits) f2fs: issue more large discard command f2fs: fix memory leak after kobject init failed in fill_super f2fs: cleanup waiting routine for writeback pages in cp f2fs: avoid to use a NULL point in destroy_segment_manager f2fs: remove unnecessary TestClearPageError when wait pages writeback f2fs: update f2fs document f2fs: avoid to wait all the node blocks during fsync f2fs: check all ones or zeros bitmap with bitops for better mount performance f2fs: change the method of calculating the number summary blocks f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr f2fs: add an option to avoid unnecessary BUG_ONs f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control f2fs: fix a deadlock during init_acl procedure f2fs: clean up acl flow for better readability f2fs: remove unnecessary segment bitmap updates f2fs: add tracepoint for vm_page_mkwrite f2fs: add tracepoint for set_page_dirty f2fs: remove redundant set_page_dirty from write_compacted_summaries f2fs: add reclaiming control by sysfs f2fs: introduce f2fs_balance_fs_bg for some background jobs ...
2013-11-13devpts: plug the memory leak in kill_sbIlija Hadzic
When devpts is unmounted, there may be a no-longer-used IDR tree hanging off the superblock we are about to kill. This needs to be cleaned up before destroying the SB. The leak is usually not a big deal because unmounting devpts is typically done when shutting down the whole machine. However, shutting down an LXC container instead of a physical machine exposes the problem (the garbage is detectable with kmemleak). Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13exec/ptrace: fix get_dumpable() incorrect testsKees Cook
The get_dumpable() return value is not boolean. Most users of the function actually want to be testing for non-SUID_DUMP_USER(1) rather than SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a protected state. Almost all places did this correctly, excepting the two places fixed in this patch. Wrong logic: if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ } or if (dumpable == 0) { /* be protective */ } or if (!dumpable) { /* be protective */ } Correct logic: if (dumpable != SUID_DUMP_USER) { /* be protective */ } or if (dumpable != 1) { /* be protective */ } Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a user was able to ptrace attach to processes that had dropped privileges to that user. (This may have been partially mitigated if Yama was enabled.) The macros have been moved into the file that declares get/set_dumpable(), which means things like the ia64 code can see them too. CVE-2013-2929 Reported-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13kcore: add Kconfig help textRandy Dunlap
Under Pseudo filesystems, /proc/kcore support has no help. Fixes a portion of kernel bugzilla #52671: https://bugzilla.kernel.org/show_bug.cgi?id=52671 Thanks for David Howells for the help text. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: <lailavrazda1979@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13procfs: clean up proc_reg_get_unmapped_area for 80-column limitHATAYAMA Daisuke
Clean up proc_reg_get_unmapped_area due to its 80-column limit violation. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13hfsplus: implement attributes file creation functionalityVyacheslav Dubeyko
Implement functionality of creation AttributesFile metadata file on HFS+ volume in the case of absence of it. It makes trying to open AttributesFile's B-tree during mount of HFS+ volume. If HFS+ volume hasn't AttributesFile then a pointer on AttributesFile's B-tree keeps as NULL. Thereby, when it is discovered absence of AttributesFile on HFS+ volume in the begin of xattr creation operation then AttributesFile will be created. The creation of AttributesFile will have success in the case of availability (2 * clump) free blocks on HFS+ volume. Otherwise, creation operation is ended with error (-ENOSPC). Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13hfsplus: implement attributes file's header node initialization codeVyacheslav Dubeyko
Implement functionality of AttributesFile's header node initialization. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13hfsplus: add metadata file's clump size calculation functionalityVyacheslav Dubeyko
There are situation when HFS+ volume had been created without AttributesFile. Such situation can take place because of using old mkfs.hfs utility or creation HFS+ volume without taking in mind necessity to use xattrs. For example, Mac OS X 10.4 (Tiger) doesn't create AttributesFile during mkfs phase. Also it is a very frequent situation for the case of users that created HFS+ volumes under Linux. As a result, xattrs and POSIX ACLs on HFS+ volume are unavailable for such users. This patchset implements functionality of AttributesFile creation on HFS+ volume in the case of this metadata file absence during operation of xattr creation. This patch: Add functionality of metadata file's clump size calculation. Operation of AttributesFile creation needs in clump size setting. This value will be used when AttributesFile will be extended. This code is adopted from code of newfs_hfs utility of diskdev_cmds packet http://opensource.apple.com/tarballs/diskdev_cmds/. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13fs/hfs/btree.h: remove duplicate definesMichael Opdenacker
This patch removes duplicate defines from fs/hfs/btree.h [akpm@linux-foundation.org: retain the comments] Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13epoll: do not take global 'epmutex' for simple topologiesJason Baron
When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached directly to a wakeup source, we do not need to take the global 'epmutex', unless the epoll file descriptor is nested. The purpose of taking the 'epmutex' on add is to prevent complex topologies such as loops and deep wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD operations. However, for the simple case of an epoll file descriptor attached directly to a wakeup source (with no nesting), we do not need to hold the 'epmutex'. This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb performance: "On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In addition the benchmark when from scaling well on 10 sockets to scaling well on just over 40 sockets. ... Currently the benchmark stops scaling at around 40-44 sockets but it seems like I found a second unrelated bottleneck." [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc] Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13epoll: optimize EPOLL_CTL_DEL using rcuJason Baron
Nathan Zimmer found that once we get over 10+ cpus, the scalability of SPECjbb falls over due to the contention on the global 'epmutex', which is taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path by using rcu to guard against any concurrent traversals. Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for simple topologies. IE when adding a link from an epoll file descriptor to a wakeup source, where the epoll file descriptor is not nested. This patch (of 2): Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by converting the file->f_ep_links list into an rcu one. In this way, we can traverse the epoll network on the add path in parallel with deletes. Since deletes can't create loops or worse wakeup paths, this is safe. This patch in combination with the patch "epoll: Do not take global 'epmutex' for simple topologies", shows a dramatic performance improvement in scalability for SPECjbb. Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> CC: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13debugfs: use list_next_entry() in debugfs_remove_recursive()Oleg Nesterov
Change debugfs_remove_recursive() to use list_next_entry(child), no changes in generated code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13cramfs: mark as obsoleteMichael Opdenacker
Who needs cramfs when you have squashfs? At least, we should warn people that cramfs is obsolete. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm: factor commit limit calculationJerome Marchand
The same calculation is currently done in three differents places. Factor that code so future changes has to be made at only one place. [akpm@linux-foundation.org: uninline vm_commit_limit()] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13writeback: do not sync data dirtied after sync startJan Kara
When there are processes heavily creating small files while sync(2) is running, it can easily happen that quite some new files are created between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen especially if there are several busy filesystems (remember that sync traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one fs before starting it on another fs). Because WB_SYNC_ALL pass is slow (e.g. causes a transaction commit and cache flush for each inode in ext3), resulting sync(2) times are rather large. The following script reproduces the problem: function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Fix the problem by disregarding inodes dirtied after sync(2) was called in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes a time stamp when sync has started which is used for setting up work for flusher threads. To give some numbers, when above script is run on two ext4 filesystems on simple SATA drive, the average sync time from 10 runs is 267.549 seconds with standard deviation 104.799426. With the patched kernel, the average sync time from 10 runs is 2.995 seconds with standard deviation 0.096. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13/proc/pid/smaps: show VM_SOFTDIRTY flag in VmFlags lineNaoya Horiguchi
This flag shows that the VMA is "newly created" and thus represents "dirty" in the task's VM. You can clear it by "echo 4 > /proc/pid/clear_refs." Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm, mempolicy: make mpol_to_str robust and always succeedDavid Rientjes
mpol_to_str() should not fail. Currently, it either fails because the string buffer is too small or because a string hasn't been defined for a mempolicy mode. If a new mempolicy mode is introduced and no string is defined for it, just warn and return "unknown". If the buffer is too small, just truncate the string and return, the same behavior as snprintf(). This also fixes a bug where there was no NULL-byte termination when doing *p++ = '=' and *p++ ':' and maxlen has been reached. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Chen Gang <gang.chen@asianux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm: use pgdat_end_pfn() to simplify the code in othersXishi Qiu
Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: simplify ocfs2_invalidatepage() and ocfs2_releasepage()Jan Kara
Ocfs2 doesn't do data journalling. Thus its ->invalidatepage and ->releasepage functions never get called on buffers that have journal heads attached. So just use standard variants of functions from buffer.c. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: fix possible double free in ocfs2_write_begin_nolockXue jiufei
When ocfs2_write_cluster_by_desc() failed in ocfs2_write_begin_nolock() because of ENOSPC, it goes to out_quota, freeing data_ac(meta_ac). Then it calls ocfs2_try_to_free_truncate_log() to free space. If enough space freed, it will try to write again. Unfortunately, some error happenes before ocfs2_lock_allocators(), it goes to out and free data_ac(meta_ac) again. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: add missing errno in ocfs2_ioctl_move_extents()Younger Liu
If the file is not regular or writeable, it should return errno(EPERM). This patch is based on 85a258b70d ("ocfs2: fix error handling in ocfs2_ioctl_move_extents()"). Signed-off-by: Younger Liu <younger.liu@huawei.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: do not call brelse() if group_bh is not initialized in ocfs2_group_add()Younger Liu
If group_bh is not initialized, there is no need to release. This problem does not cause anything wrong, but the patch would make the code more logical. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: rollback transaction in ocfs2_group_add()Younger Liu
If ocfs2_journal_access_di() fails, group->bg_next_group should rollback. Otherwise, there would be a inconsistency between group_bh and main_bm_bh. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: break useless while loopJunxiao Bi
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: use find_last_bit()Akinobu Mita
We already have find_last_bit(). So just use it as described in the comment. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: delay migration when the lockres is in migration stateXue jiufei
We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4 nodes. The situation is as follows: 1) Node A migrate all lockres it owned(eg. lockres A) to other nodes say node B when it umounts. 2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A with DLM_LOCK_RES_MIGRATING state set. 3) Then we umount ocfs2 on node B. It also should migrate lockres A to another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of lockers A is not cleared. Node B triggered the BUG on lockres with state DLM_LOCK_RES_MIGRATING. Signed-off-by: Xuejiufei <xuejiufei@huawei.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: skip locks in the blocked listXue jiufei
A parallel umount on 4 nodes triggered a bug in dlm_process_recovery_date(). Here's the situation: Receiving MIG_LOCKRES message, A node processes the locks in migratable lockres. It copys lvb from migratable lockres when processing the first valid lock. If there is a lock in the blocked list with the EX level, it triggers the BUG. Since valid lvbs are set when locks are granted with EX or PR levels, locks in the blocked list cannot have valid lvbs. Therefore I think we should skip the locks in the blocked list. Signed-off-by: Xuejiufei <xuejiufei@huawei.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: use bitmap_weight()Akinobu Mita
Use bitmap_weight() instead of reinventing the wheel. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: don't spam on -EDQUOTJoel Becker
-EDQUOT is a user-visible error, not a logic problem. Teach mlog_errno() to ignore it like it ignores -ENOSPC, etc. Signed-off-by: Joel Becker <jlbec@evilplan.org> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Marek Królikowski <admin@wset.edu.pl> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: add necessary check in case sb_getblk() failsRui Xiang
sb_getblk() may return an err, so add a check for bh. [joseph.qi@huawei.com: also add a check after calling sb_getblk() in ocfs2_create_xattr_block()] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: return ENOMEM when sb_getblk() failsRui Xiang
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13fs/ocfs2/file.c: fix wrong commentJunxiao Bi
Unwritten extent only exists for file systems which support holes. But the comment said was opposite meaning and also the comment is not very clear, so rephase it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13fs/ocfs2: remove unnecessary variable bits_wanted from ocfs2_calc_extend_creditsGoldwyn Rodrigues
Code cleanup to remove unnecessary variable passed but never used to ocfs2_calc_extend_credits. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-12Merge tag 'devicetree-for-3.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...
2013-11-12Merge tag 'h8300-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull h8300 platform removal from Guenter Roeck: "The patch series has been in -next for more than one relase cycle. I did get a number of Acks, and no objections. H8/300 has been dead for several years, the kernel for it has not compiled for ages, and recent versions of gcc for it are broken. Remove support for it" * tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: CREDITS: Add Yoshinori Sato for h8300 fs/minix: Drop dependency on H8300 Drop remaining references to H8/300 architecture Drop MAINTAINERS entry for H8/300 watchdog: Drop references to H8300 architecture net/ethernet: Drop H8/300 Ethernet driver net/ethernet: smsc9194: Drop conditional code for H8/300 ide: Drop H8/300 driver Drop support for Renesas H8/300 (h8300) architecture
2013-11-11ext4: add prototypes for macro-generated functionsAndreas Dilger
It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros: ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state Add explicit declarations for these functions so that grep and tags can find them. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-11-11ext4: return non-zero st_blocks for inline dataAndreas Dilger
Return a non-zero st_blocks to userspace for statfs() and friends. Some versions of tar will assume that files with st_blocks == 0 do not contain any data and will skip reading them entirely. Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-11-11Btrfs: rename btrfs_start_all_delalloc_inodesMiao Xie
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: don't wait for the completion of all the ordered extentsMiao Xie
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: don't wait for all the async delalloc when shrinking delallocMiao Xie
It was very likely that there were lots of async delalloc pages in the filesystem, if we waited until all the pages were flushed, we would be blocked for a long time, and the performance would also drop down. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: fix the confusion between delalloc bytes and metadata bytesMiao Xie
In shrink_delalloc(), what we need reclaim is the metadata space, so flushing pages by to_reclaim is not reasonable, it is very likely that the pages we flush are not enough. And then we had to invoke the flush function for several times, at the worst, we need call flush_space for several times. It wasted time. We improve this problem by converting the metadata space size we need reserve to the delalloc bytes, By this way, we can flush the pages by a reasonable number. (Now we use a fixed number to do conversion, it is not flexible, maybe we can find a good way to improve it in the future.) Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: pick up the code for the item number calculation in flush_space()Miao Xie
This patch picked up the code that was used to calculate the number of the items for which we need reserve space, and we will use it in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: wait for the ordered extent only when we wantMiao Xie
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()Miao Xie
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: avoid unnecessary scrub workers allocationWang Shilong
We only allocate scrub workers if we pass all the necessary checks, for example, there are no operation in progress. Besides, move mutex lock protection outside of scrub_workers_get() /scrub_workers_put(). Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11Btrfs: check file extent type before anything elseJosef Bacik
I hit this problem with my no holes patch and it made me realize what the problem was for bz 60834. If the first item in the leaf is an inline extent and we try to read anything starting from disk_bytenr onward we will read off the end of the leaf. So we need to check to see what it's type is, and if it's not REG we can just break out. This should fix this problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>