summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2013-07-30aio: percpu ioctx refcountKent Overstreet
This just converts the ioctx refcount to the new generic dynamic percpu refcount code. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-07-30aio: percpu reqs_availableKent Overstreet
See the previous patch ("aio: reqs_active -> reqs_available") for why we want to do this - this basically implements a per cpu allocator for reqs_available that doesn't actually allocate anything. Note that we need to increase the size of the ringbuffer we allocate, since a single thread won't necessarily be able to use all the reqs_available slots - some (up to about half) might be on other per cpu lists, unavailable for the current thread. We size the ringbuffer based on the nr_events userspace passed to io_setup(), so this is a slight behaviour change - but nr_events wasn't being used as a hard limit before, it was being rounded up to the next page before so this doesn't change the actual semantics. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-07-30aio: reqs_active -> reqs_availableKent Overstreet
The number of outstanding kiocbs is one of the few shared things left that has to be touched for every kiocb - it'd be nice to make it percpu. We can make it per cpu by treating it like an allocation problem: we have a maximum number of kiocbs that can be outstanding (i.e. slots) - then we just allocate and free slots, and we know how to write per cpu allocators. So as prep work for that, we convert reqs_active to reqs_available. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-07-30fs: 9p: use strlcpy instead of strncpyChen Gang
For 'NULL' terminated string, recommend always to be ended by zero. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-30dlm: WQ_NON_REENTRANT is meaningless and going awayTejun Heo
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-07-30f2fs: fix handling orphan inodesJaegeuk Kim
This patch fixes mishandling of the sbi->n_orphans variable. If users request lots of f2fs_unlink(), check_orphan_space() could be contended. In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink() would fall into the wrong state which results in the failure of add_orphan_inode(). So, let's increment sbi->n_orphans virtually prior to the actual orphan inode stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode or remove_orphan_inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: move bio_private allocation out of f2fs_bio_alloc()Gu Zheng
bio->bi_private is not always needed. As in the reading data path, end_read_io does not need bio_private for further using, so moving bio_private allocation out of f2fs_bio_alloc(). Alloc it in the submit_write_page(), and ignore it in the f2fs_readpage(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: use list_for_each rather than list_for_each_safe, in remove_orphan_inode()Gu Zheng
As we remove the target single node, so list_for_each is enought, in order to clean up, we use list_for_each_entry instead. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: use seq_puts()/seq_putc() rather than seq_printf() where possibleGu Zheng
For string without format specifiers, using seq_puts()/seq_putc() instead of seq_printf(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: fix i_name during f2fs_sync_fileJaegeuk Kim
As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1. The errorneous scenario is like this. 1. touch test1 2. link test1 test2 3. unlink test2 4. fsync test1 After this, i_name should be test1. CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: update file name in the inode block during f2fs_renameJaegeuk Kim
The error is reproducible by: 0. mkfs.f2fs /dev/sdb1 & mount 1. touch test1 2. touch test2 3. mv test1 test2 4. umount 5. dumpt.f2fs -i 4 /dev/sdb1 After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get test1 instead of test2. This is because f2fs didn't update the file name during the f2fs_rename. So, this patch fixes that. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: introduce help function F2FS_NODE()Gu Zheng
Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: add a help func F2FS_STAT() to get the f2fs_stat_infoGu Zheng
Add a help func F2FS_STAT() to get the f2fs_stat_info. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: add proc entry to monitor current usage of segmentsJaegeuk Kim
You can monitor valid block counts of whole segments in: /proc/fs/f2fs/sdb1/segment_info. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: recover date requested by fdatasyncJaegeuk Kim
In order to support SQLite that uses fdatasync instead of fsync, we should guarantee the data requested by fdatasync can be recovered after sudden-power- off. So, let's remove the fdatasync condition in f2fs_sync_file. Otherwise, we can restore the data after sudden-power-off due to nonexistence of any fsync mark'ed node blocks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-29Merge 3.11-rc3 into driver-core-nextGreg Kroah-Hartman
We want these fixes in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-29ext4: add WARN_ON to check the length of allocated blocksZheng Liu
In commit 921f266b: ext4: add self-testing infrastructure to do a sanity check, some sanity checks were added in map_blocks to make sure 'retval == map->m_len'. Enable these checks by default and report any assertion failures using ext4_warning() and WARN_ON() since they can help us to figure out some bugs that are otherwise hard to hit. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-29ext4: fix retry handling in ext4_ext_truncate()Theodore Ts'o
We tested for ENOMEM instead of -ENOMEM. Oops. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-07-26cuse: convert class code to use dev_groupsGreg Kroah-Hartman
The dev_attrs field of struct class is going away soon, dev_groups should be used instead. This converts the cuse class code to use the correct field. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-26nfsd4: fix setlease error returnJ. Bruce Fields
This actually makes a difference in the 4.1 case, since we use the status to decide what reason to give the client for the delegation refusal (see nfsd4_open_deleg_none_ext), and in theory a client might choose suboptimal behavior if we give the wrong answer. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-26ext4: destroy ext4_es_cachep on module unloadEric Sandeen
Without this, module can't be reloaded. [ 500.521980] kmem_cache_sanity_check (ext4_extent_status): Cache name already exists. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # v3.8+
2013-07-26ext4: make sure group number is bumped after a inode allocation raceTheodore Ts'o
When we try to allocate an inode, and there is a race between two CPU's trying to grab the same inode, _and_ this inode is the last free inode in the block group, make sure the group number is bumped before we continue searching the rest of the block groups. Otherwise, we end up searching the current block group twice, and we end up skipping searching the last block group. So in the unlikely situation where almost all of the inodes are allocated, it's possible that we will return ENOSPC even though there might be free inodes in that last block group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-07-26Merge tag 'for-linus-v3.11-rc3' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fix from Ben Myers: "Fix for regression in commit cca9f93a52d2 ("xfs: don't do IO when creating an new inode"), recovery causing filesystem corruption after a crash" * tag 'for-linus-v3.11-rc3' of git://oss.sgi.com/xfs/xfs: xfs: di_flushiter considered harmful
2013-07-26Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix from Bruce Fields: "One more nfsd bugfix for 3.11" * 'for-3.11' of git://linux-nfs.org/~bfields/linux: nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file
2013-07-25xfs: di_flushiter considered harmfulDave Chinner
When we made all inode updates transactional, we no longer needed the log recovery detection for inodes being newer on disk than the transaction being replayed - it was redundant as replay of the log would always result in the latest version of the inode would be on disk. It was redundant, but left in place because it wasn't considered to be a problem. However, with the new "don't read inodes on create" optimisation, flushiter has come back to bite us. Essentially, the optimisation made always initialises flushiter to zero in the create transaction, and so if we then crash and run recovery and the inode already on disk has a non-zero flushiter it will skip recovery of that inode. As a result, log recovery does the wrong thing and we end up with a corrupt filesystem. Because we have to support old kernel to new kernel upgrades, we can't just get rid of the flushiter support in log recovery as we might be upgrading from a kernel that doesn't have fully transactional inode updates. Unfortunately, for v4 superblocks there is no way to guarantee that log recovery knows about this fact. We cannot add a new inode format flag to say it's a "special inode create" because it won't be understood by older kernels and so recovery could do the wrong thing on downgrade. We cannot specially detect the combination of zero mode/non-zero flushiter on disk to non-zero mode, zero flushiter in the log item during recovery because wrapping of the flushiter can result in false detection. Hence that makes this "don't use flushiter" optimisation limited to a disk format that guarantees that we don't need it. And that means the only fix here is to limit the "no read IO on create" optimisation to version 5 superblocks.... Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit e60896d8f2b81412421953e14d3feb14177edb56)
2013-07-25xattr: Constify ->name member of "struct xattr".Tetsuo Handa
Since everybody sets kstrdup()ed constant string to "struct xattr"->name but nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure checking by constifying ->name member of "struct xattr". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Joel Becker <jlbec@evilplan.org> [ocfs2] Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Tested-by: Paul Moore <paul@paul-moore.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-07-24NFSv4: Fix nfs4_init_uniform_client_string for net namespacesTrond Myklebust
Commit 6f2ea7f2a (NFS: Add nfs4_unique_id boot parameter) introduces a boot parameter that allows client administrators to set a string identifier for use by the EXCHANGE_ID and SETCLIENTID arguments in order to make them more globally unique. Unfortunately, that uniquifier is no longer globally unique in the presence of net namespaces, since each container expects to be able to set up their own lease when mounting a new NFSv4/4.1 partition. The fix is to add back in the container-specific hostname in addition to the unique id. Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-24xfs: di_flushiter considered harmfulDave Chinner
When we made all inode updates transactional, we no longer needed the log recovery detection for inodes being newer on disk than the transaction being replayed - it was redundant as replay of the log would always result in the latest version of the inode would be on disk. It was redundant, but left in place because it wasn't considered to be a problem. However, with the new "don't read inodes on create" optimisation, flushiter has come back to bite us. Essentially, the optimisation made always initialises flushiter to zero in the create transaction, and so if we then crash and run recovery and the inode already on disk has a non-zero flushiter it will skip recovery of that inode. As a result, log recovery does the wrong thing and we end up with a corrupt filesystem. Because we have to support old kernel to new kernel upgrades, we can't just get rid of the flushiter support in log recovery as we might be upgrading from a kernel that doesn't have fully transactional inode updates. Unfortunately, for v4 superblocks there is no way to guarantee that log recovery knows about this fact. We cannot add a new inode format flag to say it's a "special inode create" because it won't be understood by older kernels and so recovery could do the wrong thing on downgrade. We cannot specially detect the combination of zero mode/non-zero flushiter on disk to non-zero mode, zero flushiter in the log item during recovery because wrapping of the flushiter can result in false detection. Hence that makes this "don't use flushiter" optimisation limited to a disk format that guarantees that we don't need it. And that means the only fix here is to limit the "no read IO on create" optimisation to version 5 superblocks.... Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-24vfs: Lock in place mounts from more privileged usersEric W. Biederman
When creating a less privileged mount namespace or propogating mounts from a more privileged to a less privileged mount namespace lock the submounts so they may not be unmounted individually in the child mount namespace revealing what is under them. This enforces the reasonable expectation that it is not possible to see under a mount point. Most of the time mounts are on empty directories and revealing that does not matter, however I have seen an occassionaly sloppy configuration where there were interesting things concealed under a mount point that probably should not be revealed. Expirable submounts are not locked because they will eventually unmount automatically so whatever is under them already needs to be safe for unprivileged users to access. From a practical standpoint these restrictions do not appear to be significant for unprivileged users of the mount namespace. Recursive bind mounts and pivot_root continues to work, and mounts that are created in a mount namespace may be unmounted there. All of which means that the common idiom of keeping a directory of interesting files and using pivot_root to throw everything else away continues to work just fine. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-07-23NFSv4.1 Use the mount point rpc_clnt for layoutreturnAndy Adamson
Should not use the clientid maintenance rpc_clnt. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23NFS: Fix return type of nfs4_end_drain_session() stubChuck Lever
Clean up: when NFSv4.1 support is compiled out, nfs4_end_drain_session() becomes a stub. Make the synopsis of the stub match the synopsis of the real version of the function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0Nadav Shemer
nfs4_proc_setattr removes ATTR_OPEN from sattr->ia_valid, but later nfs4_do_setattr checks for it Signed-off-by: Nadav Shemer <nadav@tonian.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23NFSv4: encode_attrs should not backfill the bitmap and attribute lengthTrond Myklebust
The attribute length is already calculated in advance. There is no reason why we cannot calculate the bitmap in advance too so that we don't have to play pointer games. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse bugfixes from Miklos Szeredi: "These are bugfixes and a cleanup to the "readdirplus" feature" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: readdirplus: cleanup fuse: readdirplus: change attributes once fuse: readdirplus: fix instantiate fuse: readdirplus: sanity checks fuse: readdirplus: fix dentry leak
2013-07-23NFSv4: Fix brainfart in attribute length calculationTrond Myklebust
The calculation of the attribute length was 4 bytes off. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Andre Heider <a.heider@gmail.com> Reported-and-tested-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-23nfsd: nfs4_file_get_access: need to be more careful with O_RDWRHarshula Jayasuriya
If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR])) doesn't trigger when it should. Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-23nfsd: nfsd_open: when dentry_open returns an error do not propagate as ↵Harshula Jayasuriya
struct file The following call chain: ------------------------------------------------------------ nfs4_get_vfs_file - nfsd_open - dentry_open - do_dentry_open - __get_file_write_access - get_write_access - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY; ------------------------------------------------------------ can result in the following state: ------------------------------------------------------------ struct nfs4_file { ... fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0}, fi_access = {{ counter = 0x1 }, { counter = 0x0 }}, ... ------------------------------------------------------------ 1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NULL, hence nfsd_open() is called where we get status set to an error and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented. 2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented. Thus we leave a landmine in the form of the nfs4_file data structure in an incorrect state. 3) Eventually, when __nfs4_file_put_access() is called it finds fi_access[O_WRONLY] being non-zero, it decrements it and calls nfs4_file_put_fd() which tries to fput -ETXTBSY. ------------------------------------------------------------ ... [exception RIP: fput+0x9] RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282 RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002 RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6 RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58 R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd] #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd] #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd] #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd] #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd] #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd] #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd] #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc] #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc] #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd] #19 [ffff88062e365ee8] kthread at ffffffff81090886 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a ------------------------------------------------------------ Cc: stable@vger.kernel.org Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-22xfs: Start using pquotaino from the superblock.Chandra Seetharaman
Start using pquotino and define a macro to check if the superblock has pquotino. Keep backward compatibilty by alowing mount of older superblock with no separate pquota inode. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22xfs: Initialize all quota inodes to be NULLFSINOChandra Seetharaman
mkfs doesn't initialize the quota inodes to NULLFSINO as it does for the other internal inodes. This leads to two in-core values (0 and NULLFSINO) to be checked against, to make sure if a quota inode is valid. Solve that problem by initializing the in-core values of all quotaino values to NULLFSINO if they are 0 in the disk. Note that these values are not written back to on-disk superblock unless some quota is enabled on the filesystem. Even in that case sb_pquotino is written to disk only if the on-disk superblock supports pquotino Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22xfs: Fix a deadlock in xfs_log_commit_cil() code pathChandra Seetharaman
While testing and rearranging pquota/gquota code, I stumbled on a xfs_shutdown() during a mount. But the mount just hung. Debugged and found that there is a deadlock involving &log->l_cilp->xc_ctx_lock. It is in a code path where &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels down the same semaphore is being acquired in write mode causing a deadlock. This is the stack: xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode xlog_print_tic_res xfs_force_shutdown xfs_log_force_umount xlog_cil_force xlog_cil_force_lsn xlog_cil_push_foreground xlog_cil_push - tries to acquire same semaphore in write mode This patch fixes the deadlock by changing the reason code for xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR. SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since we are in the log path. Thanks to Dave for suggesting this solution. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22xfs: fix assertion failure in xfs_vm_write_failed()Jie Liu
In xfs_vm_write_failed(), we evaluate the block_offset of pos with PAGE_MASK which is an unsigned long. That is fine on 64-bit platforms regardless of whether the request pos is 32-bit or 64-bit. However, on 32-bit platforms the value is 0xfffff000 and so the high 32 bits in it will be masked off with (pos & PAGE_MASK) for a 64-bit pos. As a result, the evaluated block_offset is incorrect which will cause this failure ASSERT(block_offset + from == pos); and potentially pass the wrong block to xfs_vm_kill_delalloc_range(). In this case, we can get a kernel panic if CONFIG_XFS_DEBUG is enabled: XFS: Assertion failed: block_offset + from == pos, file: fs/xfs/xfs_aops.c, line: 1504 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:100! invalid opcode: 0000 [#1] SMP ........ Pid: 4057, comm: mkfs.xfs Tainted: G O 3.9.0-rc2 #1 EIP: 0060:[<f94a7e8b>] EFLAGS: 00010282 CPU: 0 EIP is at assfail+0x2b/0x30 [xfs] EAX: 00000056 EBX: f6ef28a0 ECX: 00000007 EDX: f57d22a4 ESI: 1c2fb000 EDI: 00000000 EBP: ea6b5d30 ESP: ea6b5d1c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 094f3ff4 CR3: 2bcb4000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process mkfs.xfs (pid: 4057, ti=ea6b4000 task=ea5799e0 task.ti=ea6b4000) Stack: 00000000 f9525c48 f951fa80 f951f96b 000005e4 ea6b5d7c f9494b34 c19b0ea2 00000066 f3d6c620 c19b0ea2 00000000 e9a91458 00001000 00000000 00000000 00000000 c15c7e89 00000000 1c2fb000 00000000 00000000 1c2fb000 00000080 Call Trace: [<f9494b34>] xfs_vm_write_failed+0x74/0x1b0 [xfs] [<c15c7e89>] ? printk+0x4d/0x4f [<f9494d7d>] xfs_vm_write_begin+0x10d/0x170 [xfs] [<c110a34c>] generic_file_buffered_write+0xdc/0x210 [<f949b669>] xfs_file_buffered_aio_write+0xf9/0x190 [xfs] [<f949b7f3>] xfs_file_aio_write+0xf3/0x160 [xfs] [<c115e504>] do_sync_write+0x94/0xd0 [<c115ed1f>] vfs_write+0x8f/0x160 [<c115e470>] ? wait_on_retry_sync_kiocb+0x50/0x50 [<c115f017>] sys_write+0x47/0x80 [<c15d860d>] sysenter_do_call+0x12/0x28 ............. EIP: [<f94a7e8b>] assfail+0x2b/0x30 [xfs] SS:ESP 0068:ea6b5d1c ---[ end trace cdd9af4f4ecab42f ]--- Kernel panic - not syncing: Fatal exception In order to avoid this, we can evaluate the block_offset of the start of the page by using shifts rather than masks the mismatch problem. Thanks Dave Chinner for help finding and fixing this bug. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22GFS2: Replace PTR_RET with PTR_ERR_OR_ZEROSteven Whitehouse
PTR_RET should be PTR_ERR Reported-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-20ext3: fix a BUG when opening a file with O_TMPFILE flagZheng Liu
When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel: kernel BUG at fs/ext3/namei.c:1992! kernel: invalid opcode: 0000 [#1] SMP kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4 kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010 kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000 kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP: 0018:ffff8801124d5cc8 EFLAGS: 00010202 kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0 kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8 kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000 kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8 kernel: FS: 00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0 kernel: Stack: kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7 kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000 kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1 kernel: Call Trace: kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd] kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3] kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7 kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30 kernel: [<ffffffff82065fa2>] ? __dequeue_entity+0x33/0x38 kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102 kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20 kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0 kernel: RIP [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP <ffff8801124d5cc8> Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20ext4: fix a BUG when opening a file with O_TMPFILE flagZheng Liu
When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel BUG at fs/ext4/namei.c:2572! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000 RIP: 0010:[<ffffffff8125ce69>] [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0 RSP: 0018:ffff88010f7b7cf8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001 RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668 R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0 FS: 00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0 DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: 0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00 ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004 Call Trace: [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180 [<ffffffff811cba78>] path_openat+0x238/0x700 [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80 [<ffffffff811cc647>] do_filp_open+0x47/0xa0 [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200 [<ffffffff811ba2e4>] do_sys_open+0x124/0x210 [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290 [<ffffffff811ba3ee>] SyS_open+0x1e/0x20 [<ffffffff816ca8d4>] tracesys+0xdd/0xe2 [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0 Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00 Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Tested-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "The sget() one is a long-standing bug and will need to go into -stable (in fact, it had been originally caught in RHEL6), the other two are 3.11-only" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: constify dentry parameter in d_count() livelock avoidance in sget() allow O_TMPFILE to work with O_WRONLY
2013-07-20Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)" I'm not sure I like this new level of "professionalism". 9-5, people, 9-5. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: call ext4_es_lru_add() after handling cache miss ext4: yield during large unlinks ext4: make the extent_status code more robust against ENOMEM failures ext4: simplify calculation of blocks to free on error ext4: fix error handling in ext4_ext_truncate()
2013-07-20Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix a regression against NFSv4 FreeBSD servers when creating a new file - Fix another regression in rpc_client_register() * tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix a regression against the FreeBSD server SUNRPC: Fix another issue with rpc_client_register()
2013-07-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next Pull btrfs fixes from Josef Bacik: "I'm playing the role of Chris Mason this week while he's on vacation. There are a few critical fixes for btrfs here, all regressions and have been tested well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next: Btrfs: fix wrong write offset when replacing a device Btrfs: re-add root to dead root list if we stop dropping it Btrfs: fix lock leak when resuming snapshot deletion Btrfs: update drop progress before stopping snapshot dropping
2013-07-20livelock avoidance in sget()Al Viro
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive, ->s_active is 1. Along comes two more processes, trying to mount the same thing; sget() in each is picking that superblock, bumping ->s_count and trying to grab ->s_umount. ->s_active is 3 now. Original mount(2) finally gets to deactivate_locked_super() on failure; ->s_active is 2, superblock is still ->fs_supers because shutdown will *not* happen until ->s_active hits 0. ->s_umount is dropped and now we have two processes chasing each other: s_active = 2, A acquired ->s_umount, B blocked A sees that the damn thing is stillborn, does deactivate_locked_super() s_active = 1, A drops ->s_umount, B gets it A restarts the search and finds the same superblock. And bumps it ->s_active. s_active = 2, B holds ->s_umount, A blocked on trying to get it ... and we are in the earlier situation with A and B switched places. The root cause, of course, is that ->s_active should not grow until we'd got MS_BORN. Then failing ->mount() will have deactivate_locked_super() shut the damn thing down. Fortunately, it's easy to do - the key point is that grab_super() is called only for superblocks currently on ->fs_supers, so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and bump ->s_active; we must never increment ->s_count for superblocks past ->kill_sb(), but grab_super() is never called for those. The bug is pretty old; we would've caught it by now, if not for accidental exclusion between sget() for block filesystems; the things like cgroup or e.g. mtd-based filesystems don't have anything of that sort, so they get bitten. The right way to deal with that is obviously to fix sget()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20allow O_TMPFILE to work with O_WRONLYAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>