summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-11-18GFS2: Deletion of unnecessary checks before two function callsMarkus Elfring
The functions iput() and put_pid() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-18jbd: Deletion of an unnecessary check before the function call "iput"Markus Elfring
The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-17VFS: refactor vfs_read()Dmitry Kasatkin
integrity_kernel_read() duplicates the file read operations code in vfs_read(). This patch refactors vfs_read() code creating a helper function __vfs_read(). It is used by both vfs_read() and integrity_kernel_read(). Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-11-18fs: Do not include mpx.h in exec.cDave Hansen
We no longer need mpx.h in exec.c. This will obviously also break the build for non-x86 builds. We get the MPX includes that we need from mmu_context.h now. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141118003608.837015B3@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: On-demand kernel allocation of bounds tablesDave Hansen
This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables *need* to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly *require* anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this *could* be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are *HUGE*. An X-GB virtual area needs 4*X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual *AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specificQiaowei Ren
MPX-enabled applications using large swaths of memory can potentially have large numbers of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. Being this huge, our expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. So we need a way to track memory use for MPX. If we want to specifically track MPX VMAs we need to be able to distinguish them from normal VMAs, and keep them from getting merged with normal VMAs. A new VM_ flag set only on MPX VMAs does both of those things. With this flag, MPX bounds-table VMAs can be distinguished from other VMAs, and userspace can also walk /proc/$pid/smaps to get memory usage for MPX. In addition to this flag, we also introduce a special ->vm_ops specific to MPX VMAs (see the patch "add MPX specific mmap interface"), but currently different ->vm_ops do not by themselves prevent VMA merging, so we still need this flag. We understand that VM_ flags are scarce and are open to other options. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151825.565625B3@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-17GFS2: update freeze code to use freeze/thaw_super on all nodesBenjamin Marzinski
The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-17fs: add freeze_super/thaw_super fs hooksBenjamin Marzinski
Currently, freezing a filesystem involves calling freeze_super, which locks sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it hard for gfs2 (and potentially other cluster filesystems) to use the vfs freezing code to do freezes on all the cluster nodes. In order to communicate that a freeze has been requested, and to make sure that only one node is trying to freeze at a time, gfs2 uses a glock (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire this lock before calling freeze_super. This means that two nodes can attempt to freeze the filesystem by both calling freeze_super, acquiring the sb->s_umount lock, and then attempting to grab the cluster glock sd_freeze_gl. Only one will succeed, and the other will be stuck in freeze_super, making it impossible to finish freezing the node. To solve this problem, this patch adds the freeze_super and thaw_super hooks. If a filesystem implements these hooks, they are called instead of the vfs freeze_super and thaw_super functions. This means that every filesystem that implements these hooks must call the vfs freeze_super and thaw_super functions itself within the hook function to make use of the vfs freezing code. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-16Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar
applying more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16Merge tag 'efi-next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi Pull EFI updates for v3.19 from Matt Fleming: - Support module unload for efivarfs - Mathias Krause - Another attempt at moving x86 to libstub taking advantage of the __pure attribute - Ard Biesheuvel - Add EFI runtime services section to ptdump - Mathias Krause Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-15Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - stable patches to fix NFSv4.x delegation reclaim error paths - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features - fix a use-after-free problem with pNFS block layouts - fix a memory leak in the pNFS files O_DIRECT code - replace an intrusive and Oops-prone performance fix in the NFSv4 atomic open code with a safer one-line version and revert the two original patches" * tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor NFS: Don't try to reclaim delegation open state if recovery failed NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired NFS: SEEK is an NFS v4.2 feature nfs: Fix use of uninitialized variable in nfs_getattr() nfs: Remove bogus assignment nfs: remove spurious WARN_ON_ONCE in write path pnfs/blocklayout: serialize GETDEVICEINFO calls nfs: fix pnfs direct write memory leak Revert "NFS: nfs4_do_open should add negative results to the dcache." Revert "NFS: remove BUG possibility in nfs4_open_and_get_state" NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
2014-11-14GFS2: Update timestamps on fallocateAndrew Price
gfs2_fallocate() wasn't updating ctime and mtime when modifying the inode. Add a call to file_update_time() to do that. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-14GFS2: Update i_size properly on fallocateAndrew Price
This addresses an issue caught by fsx where the inode size was not being updated to the expected value after fallocate(2) with mode 0. The problem was caused by the offset and len parameters being converted to multiples of the file system's block size, so i_size would be rounded up to the nearest block size multiple instead of the requested size. This replaces the per-chunk i_size updates with a single i_size_write on successful completion of the operation. With this patch gfs2 gets through a complete run of fsx. For clarity, the check for (error == 0) following the loop is removed as all failures before that point jump to out_* labels or return. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-14GFS2: Use inode_newsize_ok and get_write_access in fallocateAndrew Price
gfs2_fallocate wasn't checking inode_newsize_ok nor get_write_access. Split out the context setup and inode locking pieces into a separate function to make it more clear and add these missing calls. inode_newsize_ok is called conditional on FALLOC_FL_KEEP_SIZE as there is no need to enforce a file size limit if it isn't going to change. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-13Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add IIO include files kernel/panic.c: update comments for print_tainted mem-hotplug: reset node present pages when hot-adding a new pgdat mem-hotplug: reset node managed pages when hot-adding a new pgdat mm/debug-pagealloc: correct freepage accounting and order resetting fanotify: fix notification of groups with inode & mount marks mm, compaction: prevent infinite loop in compact_zone mm: alloc_contig_range: demote pages busy message from warn to info mm/slab: fix unalignment problem on Malta with EVA due to slab merge mm/page_alloc: restrict max order of merging on isolated pageblock mm/page_alloc: move freepage counting logic to __free_one_page() mm/page_alloc: add freepage on isolate pageblock to correct buddy list mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype mm/compaction: skip the range until proper target pageblock is met zram: avoid kunmap_atomic() of a NULL pointer
2014-11-13fanotify: fix notification of groups with inode & mount marksJan Kara
fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721 Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13ceph: fix flush tid comparisionYan, Zheng
TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid is only 16 bits. 16 bits should be plenty to let the cap flush updates pipeline appropriately, but we need to cast in the proper direction when comparing these differently-sized versions. So downcast the 64-bits one to 16 bits. Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-11-12NFS: Don't try to reclaim delegation open state if recovery failedTrond Myklebust
If state recovery failed, then we should not attempt to reclaim delegated state. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revokedTrond Myklebust
NFSv4.x (x>0) requires us to call TEST_STATEID+FREE_STATEID if a stateid is revoked. We will currently fail to do this if the stateid is a delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12NFSv4: Fix races between nfs_remove_bad_delegation() and delegation returnTrond Myklebust
Any attempt to call nfs_remove_bad_delegation() while a delegation is being returned is currently a no-op. This means that we can end up looping forever in nfs_end_delegation_return() if something causes the delegation to be revoked. This patch adds a mechanism whereby the state recovery code can communicate to the delegation return code that the delegation is no longer valid and that it should not be used when reclaiming state. It also changes the return value for nfs4_handle_delegation_recall_error() to ensure that nfs_end_delegation_return() does not reattempt the lock reclaim before state recovery is done. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATETrond Myklebust
This patch removes the assumption made previously, that we only need to check the delegation stateid when it matches the stateid on a cached open. If we believe that we hold a delegation for this file, then we must assume that its stateid may have been revoked or expired too. If we don't test it then our state recovery process may end up caching open/lock state in a situation where it should not. We therefore rename the function nfs41_clear_delegation_stateid as nfs41_check_delegation_stateid, and change it to always run through the delegation stateid test and recovery process as outlined in RFC5661. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12NFSv4: Ensure that we remove NFSv4.0 delegations when state has expiredTrond Myklebust
NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so unlike NFSv4.1, the recovery procedure when stateids have expired or have been revoked requires us to just forget the delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12NFS: SEEK is an NFS v4.2 featureAnna Schumaker
Somehow the nfs_v4_1_minor_ops had the NFS_CAP_SEEK flag set, enabling SEEK over v4.1. This is wrong, and can make servers crash. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12nfs: Fix use of uninitialized variable in nfs_getattr()Jan Kara
Variable 'err' needn't be initialized when nfs_getattr() uses it to check whether it should call generic_fillattr() or not. That can result in spurious error returns. Initialize 'err' properly. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12nfs: Remove bogus assignmentJan Kara
Commit 3a6fd1f004fc (pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist) introduced a bogus assignment pg_index = pg_index in variable initialization. AFAICS it's just a typo so remove it. Spotted by Coverity (id 1248711). CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12nfs: remove spurious WARN_ON_ONCE in write pathWeston Andros Adamson
This WARN_ON_ONCE was supposed to catch reference counting bugs, but can trigger in inappropriate situations. This was reproducible using NFSv2 on an architecture with 64K pages -- we verified that it was not a reference counting bug and the warning was safe to ignore. Reported-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12pnfs/blocklayout: serialize GETDEVICEINFO callsChristoph Hellwig
The rpc_pipefs code isn't thread safe, leading to occasional use after frees when running xfstests generic/241 (dbench). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/1411740170-18611-2-git-send-email-hch@lst.de Cc: stable@vger.kernel.org # 3.17.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12nfs: fix pnfs direct write memory leakPeng Tao
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets. So we need to take care to free them when freeing dreq. Ideally this needs to be done inside layout driver where ds_cinfo.buckets are allocated. But buckets are attached to dreq and reused across LD IO iterations. So I feel it's OK to free them in the generic layer. Cc: stable@vger.kernel.org [v3.4+] Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12btrfs: move commit out of sysfs when changing labelDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-12btrfs: move commit out of sysfs when changing featuresDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-12btrfs: introduce pending action: commitDavid Sterba
In some contexts, like in sysfs handlers, we don't want to trigger a transaction commit. It's a heavy operation, we don't know what external locks may be taken. Instead, make it possible to finish the operation through sync syscall or SYNC_FS ioctl. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-12btrfs: switch inode_cache option handling to pending changesDavid Sterba
The pending mount option(s) now share namespace and bits with the normal options, and the existing one for (inode_cache) is unset unconditionally at each transaction commit. Introduce a separate namespace for pending changes and enhance the descriptions of the intended change to use separate bits for each action. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-12btrfs: do commit in sync_fs if there are pending changesDavid Sterba
If a pending change is requested, it's not processed unless there is a transaction commit about to happen, not even after sync or SYNC_FS ioctl. For example a remount that toggles the inode_cache option will not take effect after sync on a quiescent filesystem. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-12btrfs: add support for processing pending changesDavid Sterba
There are some actions that modify global filesystem state but cannot be performed at the time of request, but later at the transaction commit time when the filesystem is in a known state. For example enabling new incompat features on-the-fly or issuing transaction commit from unsafe contexts (sysfs handlers). Signed-off-by: David Sterba <dsterba@suse.cz>
2014-11-11efivarfs: Allow unloading when build as moduleMathias Krause
There is no need to keep the module loaded when it serves no function in case the EFI runtime services are disabled. Return an error in this case so loading the module will fail. Also supply a module_exit function to allow unloading the module. Last, but not least, set the owner of the file_system_type struct. Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11f2fs: convert inline_data when i_size becomes largeJaegeuk Kim
If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode. Otherwise, we can make some dirty pages during the truncation, and those pages will be written through f2fs_write_data_page. At that moment, the inode has still inline_data, so that it tries to write non- zero pages into inline_data area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-11f2fs: fix deadlock to grab 0'th data pageJaegeuk Kim
The scenario is like this. One trhead triggers: f2fs_write_data_pages lock_page f2fs_write_data_page f2fs_lock_op <- wait The other thread triggers: f2fs_truncate truncate_blocks f2fs_lock_op truncate_partial_data_page lock_page <- wait for locking the page This patch resolves this bug by relocating truncate_partial_data_page. This function is just to truncate user data page and not related to FS consistency as well. And, we don't need to call truncate_inline_data. Rather than that, f2fs_write_data_page will finally update inline_data later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10f2fs: reduce the number of inline_data inode before clearing itJaegeuk Kim
The # of inline_data inode is decreased only when it has inline_data. After clearing the flag, we can't decreased the number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10f2fs: implement -o dirsyncJaegeuk Kim
If a mount option has dirsync, we should call checkpoint for all the directory operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10f2fs: do not skip any writes under memory pressureJaegeuk Kim
Under memory pressure, let's avoid skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10f2fs: write node pages if checkpoint is not doingJaegeuk Kim
It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10vfs: Remove i_dquot field from inodeJan Kara
All filesystems using VFS quotas are now converted to use their private i_dquot fields. Remove the i_dquot field from generic inode structure. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10jfs: Convert to private i_dquot fieldJan Kara
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> CC: jfs-discussion@lists.sourceforge.net Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10reiserfs: Convert to private i_dquot fieldJan Kara
CC: reiserfs-devel@vger.kernel.org CC: Jeff Mahoney <jeffm@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10ocfs2: Convert to private i_dquot fieldJan Kara
CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10ext4: Convert to private i_dquot fieldJan Kara
CC: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10ext3: Convert to private i_dquot fieldJan Kara
CC: linux-ext4@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10ext2: Convert to private i_dquot fieldJan Kara
CC: linux-ext4@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10quota: Use function to provide i_dquot pointersJan Kara
i_dquot array is used by relatively few filesystems (ext?, ocfs2, jfs, reiserfs) so it is beneficial to move this array to fs-private part of the inode. We cannot just pass quota pointers from filesystems to quota functions because during quotaon and quotaoff we have to traverse list of all inodes and manipulate i_dquot pointers for each inode. So we provide a function which generic quota code can use to get pointer to the i_dquot array from the filesystem. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10xfs: Set allowed quota typesJan Kara
We support user, group, and project quotas. Tell VFS about it. CC: xfs@oss.sgi.com CC: Dave Chinner <david@fromorbit.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>