summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-07-19UBIFS: remove useless statementshujianyang
This patch removes useless and duplicate statements. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: Add missing break statements in dbg_chk_pnode()hujianyang
This is a minor fix. These two branches in 'dbg_chk_pnode()' are dealing with different conditions. Although there is no fault in current state, I think adding "break"s in each end of branch is better. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: fix error handling in dump_lpt_leb()hujianyang
This patch checks the return value of 'ubifs_unpack_nnode()'. If this function returns an error, 'nnode' may not be initialized, so just print an error message and break. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-18seccomp: implement SECCOMP_FILTER_FLAG_TSYNCKees Cook
Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18sched: move no_new_privs into new atomic flagsKees Cook
Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18Merge tag 'gfs2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull gfs2 fixes from Steven Whitehouse: "This patch set contains two minor docs/spelling fixes, some fixes for flock, a change to use GFP_NOFS to avoid recursion on a rarely used code path and a fix for a race relating to the glock lru" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes GFS2: memcontrol: Spelling s/invlidate/invalidate/ GFS2: Allow caching of glocks for flock GFS2: Allow flocks to use normal glock dq rather than dq_wait GFS2: replace count*size kzalloc by kcalloc GFS2: Use GFP_NOFS when allocating glocks GFS2: Fix race in glock lru glock disposal GFS2: Only wait for demote when last holder is dequeued
2014-07-18Merge tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Dave Chinner: "Fixes for low memory perforamnce regressions and a quota inode handling regression. These are regression fixes for issues recently introduced - the change in the stack switch location is fairly important, so I've held off sending this update until I was sure that it still addresses the stack usage problem the original solved. So while the commits in the xfs tree are recent, it has been under tested for several weeks now" * tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs: xfs: null unused quota inodes when quota is on xfs: refine the allocation stack switch Revert "xfs: block allocation work needs to be kswapd aware"
2014-07-18svcrdma: Select NFSv4.1 backchannel transport based on forward channelChuck Lever
The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-18GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixesFabian Frederick
Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: memcontrol: Spelling s/invlidate/invalidate/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Allow caching of glocks for flockBob Peterson
This patch removes the GLF_NOCACHE flag from the glocks associated with flocks. There should be no good reason not to cache glocks for flocks: they only force the glock to be demoted before they can be reacquired, which can slow down performance and even cause glock hangs, especially in cases where the flocks are held in Shared (SH) mode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Allow flocks to use normal glock dq rather than dq_waitBob Peterson
This patch allows flock glocks to use a non-blocking dequeue rather than dq_wait. It also reverts the previous patch I had posted regarding dq_wait. The reverted patch isn't necessarily a bad idea, but I decided this might avoid unforeseen side effects, and was therefore safer. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: replace count*size kzalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Use GFP_NOFS when allocating glocksSteven Whitehouse
Normally GFP_KERNEL is ok here, but there is now a rarely used code path relating to deallocation of unlinked inodes (in certain corner cases) which if hit at times of memory shortage can cause recursion while trying to free memory. One solution would be to try and move the gfs2_glock_get() call so that it is no longer called while another glock is held, but that doesn't look at all easy, so GFP_NOFS is the best solution for the time being. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Fix race in glock lru glock disposalSteven Whitehouse
We must not leave items on the LRU list with GLF_LOCK set, since they can be removed if the glock is brought back into use, which may then potentially result in a hang, waiting for GLF_LOCK to clear. It doesn't happen very often, since it requires a glock that has not been used for a long time to be brought back into use at the same moment that the shrinker is part way through disposing of glocks. The fix is to set GLF_LOCK at a later time, when we already know that the other locks can be obtained. Also, we now only release the lru_lock in case a resched is needed, rather than on every iteration. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Only wait for demote when last holder is dequeuedBob Peterson
Function gfs2_glock_dq_wait is supposed to dequeue a glock and then wait for the lock to be demoted. The problem is, if this is a shared lock, its demote will depend on the other holders, which means you might end up waiting forever because the other process is blocked. This problem is especially apparent when dealing with nested flocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18timerfd: Implement timerfd_ioctl method to restore timerfd_ctx::ticks, v3Cyrill Gorcunov
The read() of timerfd files allows to fetch the number of timer ticks while there is no way to set it back from userspace. To restore the timer's state as it was at checkpoint moment we need a path to bring @ticks back. Initially I thought about writing ticks back via write() interface but it seems such API is somehow obscure. Instead implement timerfd_ioctl() method with TFD_IOC_SET_TICKS command which allows to adjust @ticks into non-zero value waking up the waiters. I wrapped code with CONFIG_CHECKPOINT_RESTORE which can be dropped off if there users except c/r camp appear. v2 (by akpm@): - Use define timerfd_ioctl NULL for non c/r config v3: - Use copy_from_user for @ticks fetching since not all arch support get_user for 8 byte argument Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Covington <cov@codeaurora.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.285617923@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-18timerfd: Implement show_fdinfo methodCyrill Gorcunov
For checkpoint/restore of timerfd files we need to know how exactly the timer were armed, to be able to recreate it on restore stage. Thus implement show_fdinfo method which provides enough information for that. One of significant changes I think is the addition of @settime_flags member. Currently there are two flags TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET, and the second can be found from @might_cancel variable but in case if the flags will be extended in future we most probably will have to somehow remember them explicitly anyway so I guss doing that right now won't hurt. To not bloat the timerfd_ctx structure I've converted @expired to short integer and defined @settime_flags as short too. v2 (by avagin@, vdavydov@ and tglx@): - Add it_value/it_interval fields - Save flags being used in timerfd_setup in context v3 (by tglx@): - don't forget to use CONFIG_PROC_FS v4 (by akpm@): -Use define timerfd_show NULL for non c/r config Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.114365649@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-17nfsd4: zero op arguments beyond the 8th compound opJ. Bruce Fields
The first 8 ops of the compound are zeroed since they're a part of the argument that's zeroed by the memset(rqstp->rq_argp, 0, procp->pc_argsize); in svc_process_common(). But we handle larger compounds by allocating the memory on the fly in nfsd4_decode_compound(). Other than code recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied lock", I don't know of any examples of code depending on this initialization. But it definitely seems possible, and I'd rather be safe. Compounds this long are unusual so I'm much more worried about failure in this poorly tested cases than about an insignificant performance hit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-17nfsd: silence sparse warning about accessing credentialsJeff Layton
sparse says: fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces) fs/nfsd/auth.c:31:38: expected struct cred const *cred fs/nfsd/auth.c:31:38: got struct cred const [noderef] <asn:4>*real_cred Add a new accessor for the ->real_cred and use that to fetch the pointer. Accessing current->real_cred directly is actually quite safe since we know that they can't go away so this is mostly a cosmetic fixup to silence sparse. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-17KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMINDavid Howells
Special kernel keys, such as those used to hold DNS results for AFS, CIFS and NFS and those used to hold idmapper results for NFS, used to be 'invalidateable' with key_revoke(). However, since the default permissions for keys were reduced: Commit: 96b5c8fea6c0861621051290d705ec2e971963f1 KEYS: Reduce initial permissions on keys it has become impossible to do this. Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be invalidated by root. This should not be used for system keyrings as the garbage collector will try and remove any invalidate key. For system keyrings, KEY_FLAG_ROOT_CAN_CLEAR can be used instead. After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and idmapper keys. Invalidated keys are immediately garbage collected and will be immediately rerequested if needed again. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve Dickson <steved@redhat.com>
2014-07-16nfsd: Ensure stateids remain unique until they are freedTrond Myklebust
Add an extra delegation state to allow the stateid to remain in the idr tree until the last reference has been released. This will be necessary to ensure uniqueness once the client_mutex is removed. [jlayton: reset the sc_type under the state_lock in unhash_delegation] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16nfsd: nfs4_alloc_init_lease should take a nfs4_file argJeff Layton
No need to pass the delegation pointer in here as it's only used to get the nfs4_file pointer. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_delegJeff Layton
state_lock is a heavily contended global lock. We don't want to grab that while simultaneously holding the inode->i_lock. Add a new per-nfs4_file lock that we can use to protect the per-nfs4_file delegation list. Hold that while walking the list in the break_deleg callback and queue the workqueue job for each one. The workqueue job can then take the state_lock and do the list manipulations without the i_lock being held prior to starting the rpc call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16nfsd: eliminate nfsd4_init_callbackJeff Layton
It's just an obfuscated INIT_WORK call. Just make the work_func_t a non-static symbol and use a normal INIT_WORK call. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16sched: Allow wait_on_bit_action() functions to support a timeoutNeilBrown
It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key *key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 * HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-15Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "Fix locking of dquot shrinker" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: missing lock in dqcache_shrink_scan()
2014-07-15f2fs: reduce searching region of segmap when free sectionChao Yu
In __set_test_and_free we will check whether all segment are free in one section When free one segment, in order to set section to free status. But the searching region of segmap is from start segno to last segno of f2fs, it's not necessary. So let's just only check all segment bitmap of target section. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-15udf: avoid redundant memcpy when writing data in ICBChao Yu
Valid data within i_size in page cache will be copied to ICB cache when we writeback the page by invoking udf_adinicb_writepage, so the copy in udf_adinicb_write_end is redundant. After we remove the copy, it's better to use simple_write_end directly in udf_adinicb_aops instead of udf_adinicb_write_end. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15fs/udf: re-use hex_asc_upper_{hi,lo} macrosAndy Shevchenko
This patch cleans up udf_translate_to_linux() a bit by using globally defined macros instead of custom code. We can use sprintf(buf, "%04X", ...) there as well, but this one faster. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15fs/quota: kernel-doc warning fixesFabian Frederick
type and id were removed and qid added to quota_send_warning in commit 431f19744d15 ("userns: Convert quota netlink aka quota_send_warning") Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15udf: use linux/uaccess.hFabian Frederick
Fix checkpatch warning WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15fs/ext2/super.c: Drop memory allocation castHimangi Saraogi
Drop cast on the result of kmem_cache_alloc. The semantic patch that makes this change is as follows: // <smpl> @@ type T; @@ - (T *) (\(kmalloc\|kzalloc\|kcalloc\|kmem_cache_alloc\|kmem_cache_zalloc\| kmem_cache_alloc_node\|kmalloc_node\|kzalloc_node\)(...)) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15quota: remove dqptr_semNiu Yawei
Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem, accessing inode->i_dquot now protected by dquot_srcu, and changing inode->i_dquot is now serialized by dq_data_lock. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15quota: simplify remove_inode_dquot_ref()Niu Yawei
Simplify the remove_inode_dquot_ref() to make it more obvious that now we keep one reference for each dquot from inodes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15quota: avoid unnecessary dqget()/dqput() callsNiu Yawei
Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(), that will introduce global lock contention otherwise. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15quota: protect Q_GETFMT by dqonoff_mutexNiu Yawei
dqptr_sem will go away. Protect Q_GETFMT quotactl by dqonoff_mutex instead. This is also enough to make sure quota info will not go away while we are looking at it. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15quota: missing lock in dqcache_shrink_scan()Niu Yawei
Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API) accidentally removed locking from quota shrinker. Fix it - dqcache_shrink_scan() should use dq_list_lock to protect the scan on free_dquots list. CC: stable@vger.kernel.org Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This contains miscellaneous fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: replace count*size kzalloc by kcalloc fuse: release temporary page if fuse_writepage_locked() failed fuse: restructure ->rename2() fuse: avoid scheduling while atomic fuse: handle large user and group ID fuse: inode: drop cast fuse: ignore entry-timeout on LOOKUP_REVAL fuse: timeout comparison fix
2014-07-15ext4: make ext4_has_inline_data() as a inline functionZheng Liu
Now ext4_has_inline_data() is used in wide spread codepaths. So we need to make it as a inline function to avoid burning some CPU cycles. Change in text size: text data bss dec hex filename before: 326110 19258 5528 350896 55ab0 fs/ext4/ext4.o after: 326227 19258 5528 351013 55b25 fs/ext4/ext4.o I use the following script to measure the CPU usage. #!/bin/bash shm_base='/dev/shm' img=${shm_base}/ext4-img mnt=/mnt/loop e2fsprgs_base=$HOME/e2fsprogs mkfs=${e2fsprgs_base}/misc/mke2fs fsck=${e2fsprgs_base}/e2fsck/e2fsck sudo umount $mnt dd if=/dev/zero of=$img bs=4k count=3145728 ${mkfs} -t ext4 -O inline_data -F $img sudo mount -t ext4 -o loop $img $mnt # start testing... testdir="${mnt}/testdir" mkdir $testdir cd $testdir echo "start testing..." for ((cnt=0;cnt<100;cnt++)); do for ((i=0;i<5;i++)); do for ((j=0;j<5;j++)); do for ((k=0;k<5;k++)); do for ((l=0;l<5;l++)); do mkdir -p $i/$j/$k/$l echo "$i-$j-$k-$l" > $i/$j/$k/$l/testfile done done done done ls -R $testdir > /dev/null rm -rf $testdir/* done The result of `perf top -G -U` is as below. vanilla: 13.92% [ext4] [k] ext4_do_update_inode 9.36% [ext4] [k] __ext4_get_inode_loc 4.07% [ext4] [k] ftrace_define_fields_ext4_writepages 3.83% [ext4] [k] __ext4_handle_dirty_metadata 3.42% [ext4] [k] ext4_get_inode_flags 2.71% [ext4] [k] ext4_mark_iloc_dirty 2.46% [ext4] [k] ftrace_define_fields_ext4_direct_IO_enter 2.26% [ext4] [k] ext4_get_inode_loc 2.22% [ext4] [k] ext4_has_inline_data [...] After applied the patch, we don't see ext4_has_inline_data() because it has been inlined and perf couldn't sample it. Although it doesn't mean that the CPU cycles can be saved but at least the overhead of function calls can be eliminated. So IMHO we'd better inline this function. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-07-15ext4: remove readpage() check in ext4_mmap_file()Zhang Zhen
There is no kind of file which does not supply a page reading function. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-07-15ext4: fix punch hole on files with indirect mappingLukas Czerner
Currently punch hole code on files with direct/indirect mapping has some problems which may lead to a data loss. For example (from Jan Kara): fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Also the code is a bit weird and it's not using infrastructure provided by indirect.c, but rather creating it's own way. This patch fixes the issues as well as making the operation to run 4 times faster from my testing (punching out 60GB file). It uses similar approach used in ext4_ind_truncate() which takes advantage of ext4_free_branches() function. Also rename the ext4_free_hole_blocks() to something more sensible, like the equivalent we have for extent mapped files. Call it ext4_ind_remove_space(). This has been tested mostly with fsx and some xfstests which are testing punch hole but does not require unwritten extents which are not supported with direct/indirect mapping. Not problems showed up even with 1024k block size. CC: stable@vger.kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-07-15ext4: remove metadata reservation checksTheodore Ts'o
Commit 27dd43854227b ("ext4: introduce reserved space") reserves 2% of the file system space to make sure metadata allocations will always succeed. Given that, tracking the reservation of metadata blocks is no longer necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-07-15ext4: rearrange initialization to fix EXT4FS_DEBUGTheodore Ts'o
The EXT4FS_DEBUG is a *very* developer specific #ifdef designed for ext4 developers only. (You have to modify fs/ext4/ext4.h to enable it.) Rearrange how we initialize data structures to avoid calling ext4_count_free_clusters() until the multiblock allocator has been initialized. This also allows us to only call ext4_count_free_clusters() once, and simplifies the code somewhat. (Thanks to Chen Gang <gang.chen.5i5j@gmail.com> for pointing out a !CONFIG_SMP compile breakage in the original patch.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2014-07-15xfs: add log attributes for log lsn and grant head dataBrian Foster
Create log attributes to export the current runtime state of the log to sysfs. Note that the filesystem should be frozen for consistency across attributes. The following per-mount attributes are created: log_head_lsn, log_tail_lsn, reserve_grant_head and write_grant_head. These represent the physical log head, tail and reserve and write grant heads respectively. Attribute values are exported in the following format: "cycle:[block,byte]" ... where cycle represents the log cycle and [block,bytes] represents either the basic block or byte offset of the log, depending on the attribute. Log sequence number (LSN) values are encoded in basic blocks and grant heads are encoded in bytes. All values are in decimal format. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-15xfs: add xlog sysfs kobject and attribute handlersBrian Foster
Embed a kobject into the xfs log data structure (xlog). This creates a 'log' subdirectory for every XFS mount instance in sysfs. The lifecycle of the log kobject is tied to the lifecycle of the log. Also define a set of generic attribute handlers associated with the log kobject in preparation for the addition of attributes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-15xfs: add xfs_mount sysfs kobjectBrian Foster
Embed a base kobject into xfs_mount. This creates a kobject associated with each XFS mount and a subdirectory in sysfs with the name of the filesystem. The subdirectory lifecycle matches that of the mount. Also add the new xfs_sysfs.[c,h] source files with some XFS sysfs infrastructure to facilitate attribute creation. Note that there are currently no attributes exported as part of the xfs_mount kobject. It exists solely to serve as a per-mount container for child objects. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-15xfs: add a sysfs ksetBrian Foster
Create a sysfs kset to contain all sub-objects associated with the XFS module. The kset is created and removed on module initialization and removal respectively. The kset uses fs_obj as a parent. This leads to the creation of a /sys/fs/xfs directory when the kset exists. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-15xfs: fix a couple error sequence jumps in xfs_mountfs()Brian Foster
xfs_mountfs() has a couple failure conditions that do not jump to the correct labels. Specifically: - xfs_initialize_perag_data() failure does not deallocate the log even though it occurs after log initialization - xfs_mount_reset_sbqflags() failure returns the error directly rather than jump to the error sequence Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>