Age | Commit message (Collapse) | Author |
|
When gfs2_create_inode() fails due to quota violation, the VFS
inode is not completely uninitialized. This can cause a list
corruption error.
This patch correctly uninitializes the VFS inode when a quota
violation occurs in the gfs2_create_inode codepath.
Resolves: rhbz#1059808
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Allow locks_mandatory_area() to handle file-private locks correctly.
If there is a file-private lock set on an open file and we're doing I/O
via the same, then that should not cause anything to block.
Handle this by first doing a non-blocking FL_ACCESS check for a
file-private lock, and then fall back to checking for a classic POSIX
lock (and possibly blocking).
Note that this approach is subject to the same races that have always
plagued mandatory locking on Linux.
Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
As Trond pointed out, you can currently deadlock yourself by setting a
file-private lock on a file that requires mandatory locking and then
trying to do I/O on it.
Avoid this problem by plumbing some knowledge of file-private locks into
the mandatory locking code. In order to do this, we must pass down
information about the struct file that's being used to
locks_verify_locked.
Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
|
|
Neil Brown suggested potentially overloading the l_pid value as a "lock
context" field for file-private locks. While I don't think we will
probably want to do that here, it's probably a good idea to ensure that
in the future we could extend this API without breaking existing
callers.
Typically the l_pid value is ignored for incoming struct flock
arguments, serving mainly as a place to return the pid of the owner if
there is a conflicting lock. For file-private locks, require that it
currently be set to 0 and return EINVAL if it isn't. If we eventually
want to make a non-zero l_pid mean something, then this will help ensure
that we don't break legacy programs that are using file-private locks.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.
This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.
Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.
This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.
This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.
These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.
The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.
Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
It's not really feasible to do deadlock detection with FL_FILE_PVT
locks since they aren't owned by a single task, per-se. Deadlock
detection also tends to be rather expensive so just skip it for
these sorts of locks.
Also, add a FIXME comment about adding more limited deadlock detection
that just applies to ro -> rw upgrades, per Andy's request.
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.
This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.
Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
Move this check into flock64_to_posix_lock instead of duplicating it in
two places. This also fixes a minor wart in the code where we continue
referring to the struct flock after converting it to struct file_lock.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit
off_t.
The existing range checks also seem to depend on signed arithmetic
wrapping when it overflows. In practice maybe that works, but we can be
more careful. That also allows us to make a more reliable distinction
between -EINVAL and -EOVERFLOW.
Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller
to set a lock with starting point no longer representable as a 32-bit
value. We could return -EOVERFLOW in such cases, but the locks code is
capable of handling such ranges, so we choose to be lenient here. The
only problem is that subsequent GETLK calls on such a lock will fail
with EOVERFLOW.
While we're here, do some cleanup including consolidating code for the
flock and flock64 cases.
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
A leftover lock on the list is surely a sign of a problem of some sort,
but it's not necessarily a reason to panic the box. Instead, just log a
warning with some info about the lock, and then delete it like we would
any other lock.
In the event that the filesystem declares a ->lock f_op, we may end up
leaking something, but that's generally preferable to an immediate
panic.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
...to make sparse happy.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
It's best to let the compiler decide that.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.
To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.
Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.
Cc: Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
|
|
ENOSPC was being returned in slot_get inspite of successful
execution of the function. This patch fixes this return
code.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Linux 3.14
The vt-d w/a merged late in 3.14-rc needs a bit of fine-tuning, hence
backmerge.
Conflicts:
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/intel_ddi.c
drivers/gpu/drm/i915/intel_dp.c
All trivial adjacent lines changed type conflicts, so trivial git
doesn't even show them in the merg commit.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
hash modifications into false negatives from __lookup_mnt() (instead
of hangs)"
On the false negatives from __lookup_mnt():
"The *only* thing we care about is not getting stuck in __lookup_mnt().
If it misses an entry because something in front of it just got moved
around, etc, we are fine. We'll notice that mount_lock mismatch and
that'll be it"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch mnt_hash to hlist
don't bother with propagate_mnt() unless the target is shared
keep shadowed vfsmounts together
resizable namespace.c hashes
|
|
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating. Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.
[fix for dumb braino folded]
Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
preparation to switching mnt_hash to hlist
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Besides checking rpc_xprt out of xs_setup_bc_tcp,
increase it's reference (it's important).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Testing NFS4.0 by pynfs, I got some messeages as,
"nfsd: inode locked twice during operation."
When one compound RPC contains two or more ops that locks
the filehandle,the second op will cause the message.
As two SETATTR ops, after the first SETATTR, nfsd will not call
fh_put() to release current filehandle, it means filehandle have
unlocked with fh_post_saved = 1.
The second SETATTR find fh_post_saved = 1, and printk the message.
v2: introduce helper fh_clear_wcc().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
RFC5661 obsoletes NFS4ERR_STALE_STATEID in favour of NFS4ERR_BAD_STATEID.
Note that because nfsd encodes the clientid boot time in the stateid, we
can hit this error case in certain scenarios where the Linux client
state management thread exits early, before it has finished recovering
all state.
Reported-by: Idan Kedar <idank@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case. And
in the 4.1 case, if we ran out of buffer space, we should have exceeded
a session limit too.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
cstate->slot and ->session are each set together in nfsd4_sequence. If
one is non-NULL, so is the other.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Maybe this is comment true, who cares? Handle this like any other
error.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This isn't actually used anywhere else.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Connection from alloc_conn must be freed through free_conn,
otherwise, the reference of svc_xprt will never be put.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If we interrupt the nfs4_wait_for_completion_rpc_task() call in
nfs4_run_open_task(), then we don't prevent the RPC call from
completing. So freeing up the opendata->f_attr.mdsthreshold
in the error path in _nfs4_do_open() leads to a use-after-free
when the XDR decoder tries to decode the mdsthreshold information
from the server.
Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN)
Tested-by: Steve Dickson <SteveD@redhat.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: strlcpy (lib/string.c:388 lib/string.c:151)
CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
RIP: strlcpy (lib/string.c:388 lib/string.c:151)
Call Trace:
ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
vfs_mkdir (fs/namei.c:3467)
SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
tracesys (arch/x86/kernel/entry_64.S:749)
akpm: this patch probably disables the feature. A temporary thing to
avoid triviel oopses.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
during lockd_up
We had a Fedora ABRT report with a stack trace like this:
kernel BUG at net/sunrpc/svc.c:550!
invalid opcode: 0000 [#1] SMP
[...]
CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
RIP: 0010:[<ffffffffa0305fa8>] [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP: 0018:ffff88003f9b9de0 EFLAGS: 00010206
RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
FS: 00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
Stack:
ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
Call Trace:
[<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
[<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
[<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
[<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
[<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
[<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
[<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
[<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
[<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
[<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
[<ffffffff811c3f99>] ? putname+0x29/0x40
[<ffffffff811b9569>] SyS_write+0x49/0xa0
[<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
[<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
RIP [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP <ffff88003f9b9de0>
Evidently, we created some lockd sockets and then failed to create
others. make_socks then returned an error and we tried to tear down the
svc, but svc->sv_permsocks was not empty so we ended up tripping over
the BUG() in svc_destroy().
Fix this by ensuring that we tear down any live sockets we created when
socket creation is going to return an error.
Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...)
Reported-by: Raphos <raphoszap@laposte.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When stopping nfsd, I got BUG messages, and soft lockup messages,
The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
and destroy_client().
This patch just let nfsd traversing unconfirmed client through
hash-table instead of rbtree.
[ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
[ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
[last unloaded: nfsd]
[ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF O
3.14.0-rc8+ #2
[ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
ffff8800797f6000
[ 2325.022982] RIP: 0010:[<ffffffff8133c18c>] [<ffffffff8133c18c>]
rb_erase+0x14c/0x390
[ 2325.022982] RSP: 0018:ffff8800797f7d98 EFLAGS: 00010246
[ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
0000000000000000
[ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
ffff880079f4c810
[ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
ffff88007964fc70
[ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
ffff880079f4c800
[ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
ffff880079f4c860
[ 2325.022982] FS: 0000000000000000(0000) GS:ffff88007f900000(0000)
knlGS:0000000000000000
[ 2325.022982] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
00000000000006e0
[ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 2325.022982] Stack:
[ 2325.022982] ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
ffff8800797f7da8
[ 2325.022982] ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
ffff880079c1f010
[ 2325.022982] ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
ffff8800797f7df0
[ 2325.022982] Call Trace:
[ 2325.022982] [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
[ 2325.022982] [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
[nfsd]
[ 2325.022982] [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 2325.022982] [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 2325.022982] [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 2325.022982] [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 2325.022982] [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
[ 2325.022982] [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2325.022982] [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
[ 2325.022982] RIP [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] RSP <ffff8800797f7d98>
[ 2325.022982] CR2: 0000000000000000
[ 2325.022982] ---[ end trace 28c27ed011655e57 ]---
[ 228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
[ 228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
i2c_core ata_generic pata_acpi
[ 228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF O
3.14.0-rc8+ #2
[ 228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
ffff880074616000
[ 228.064539] RIP: 0010:[<ffffffff8133ba17>] [<ffffffff8133ba17>]
rb_next+0x27/0x50
[ 228.064539] RSP: 0018:ffff880074617de0 EFLAGS: 00000282
[ 228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
0000000000000014
[ 228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
ffff880074478010
[ 228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
0000000000000012
[ 228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
ffffea0001d11a00
[ 228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
ffff880074617d50
[ 228.064539] FS: 0000000000000000(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[ 228.064539] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
00000000000006f0
[ 228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 228.064539] Stack:
[ 228.064539] ffff880074617e28 ffffffffa01ab7db ffff880074617df0
ffff880074617df0
[ 228.064539] ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
0000000000000000
[ 228.064539] 0000000000000000 ffff880074617e48 ffffffffa01840b8
ffffffff81cc26c0
[ 228.064539] Call Trace:
[ 228.064539] [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
[nfsd]
[ 228.064539] [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 228.064539] [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 228.064539] [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 228.064539] [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 228.064539] [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
[ 228.064539] [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 228.064539] [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 228.064539] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7
Fixes: ac55fdc408039 (nfsd: move the confirmed and unconfirmed hlists...)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
As reported by Tang Chen, Gu Zheng and Yasuaki Isimatsu, the following issues
exist in the aio ring page migration support.
As a result, for example, we have the following problem:
thread 1 | thread 2
|
aio_migratepage() |
|-> take ctx->completion_lock |
|-> migrate_page_copy(new, old) |
| *NOW*, ctx->ring_pages[idx] == old |
|
| *NOW*, ctx->ring_pages[idx] == old
| aio_read_events_ring()
| |-> ring = kmap_atomic(ctx->ring_pages[0])
| |-> ring->head = head; *HERE, write to the old ring page*
| |-> kunmap_atomic(ring);
|
|-> ctx->ring_pages[idx] = new |
| *BUT NOW*, the content of |
| ring_pages[idx] is old. |
|-> release ctx->completion_lock |
As above, the new ring page will not be updated.
Fix this issue, as well as prevent races in aio_ring_setup() by holding
the ring_lock mutex during kioctx setup and page migration. This avoids
the overhead of taking another spinlock in aio_read_events_ring() as Tang's
and Gu's original fix did, pushing the overhead into the migration code.
Note that to handle the nesting of ring_lock inside of mmap_sem, the
migratepage operation uses mutex_trylock(). Page migration is not a 100%
critical operation in this case, so the ocassional failure can be
tolerated. This issue was reported by Sasha Levin.
Based on feedback from Linus, avoid the extra taking of ctx->completion_lock.
Instead, make page migration fully serialised by mapping->private_lock, and
have aio_free_ring() simply disconnect the kioctx from the mapping by calling
put_aio_ring_file() before touching ctx->ring_pages[]. This simplifies the
error handling logic in aio_migratepage(), and should improve robustness.
v4: always do mutex_unlock() in cases when kioctx setup fails.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: stable@vger.kernel.org
|
|
We have a WARN_ON in the nfsd4_decode_write() that tells us when the
client has sent a request that is not padded out properly according to
RFC4506. A WARN_ON really isn't appropriate in this case though since
this indicates a client bug, not a server one.
Move this check out to the top-level compound decoder and have it just
explicitly return an error. Also add a dprintk() that shows the client
address and xid to help track down clients and frames that trigger it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Looks like this bug has been here since these write counts were
introduced, not sure why it was just noticed now.
Thanks also to Jan Kara for pointing out the problem.
Cc: stable@vger.kernel.org
Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If the entire operation fails then there's nothing to encode.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This fixes an ommission from 18032ca062e621e15683cb61c066ef3dc5414a7b
"NFSD: Server implementation of MAC Labeling", which increased the size
of the setattr error reply without increasing COMPOUND_ERR_SLACK_SPACE.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If a client attempts to set an excessively large ACL, return
NFS4ERR_FBIG instead of NFS4ERR_RESOURCE. I'm not sure FBIG is correct,
but I'm positive RESOURCE is wrong (it isn't even a well-defined error
any more for NFS versions since 4.1).
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|