Age | Commit message (Collapse) | Author |
|
Usage coutner now increased only is the service was started sccessfully.
Even if service is running already, then goto is not required anymore, because
service creation and start will be skipped.
With this patch code looks clearer.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This is just a code move, which from my POW makes code looks better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.
Patch also renames goto label "out_err:" into "err_start:" to reflect new
changes.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
No need to assign transports backchannel server explicitly in
nfs41_callback_up() - there is nfs_callback_bc_serv() function for this.
By using it, nfs4_callback_up() and nfs41_callback_up() can be called without
transport argument.
Note: service have to be passed to nfs_callback_bc_serv() instead of callback,
since callback link can be uninitialized.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
v4:
1) Callback transport creation routine selection by version simlified.
This new function in now called before nfs_minorversion_callback_svc_setup()).
Also few small changes:
1) current network namespace in nfs_callback_up() was replaced by transport net.
2) svc_shutdown_net() was moved prior to callback usage counter decrement
(because in case of per-net data allocation faulure svc_shutdown_net() have to
be skipped).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This function creates service if it's not exist, or increase usage counter of
the existent, and returns pointer to it.
Usage counter will be droppepd by svc_destroy() later in nfs_callback_up().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The OPEN operation has no way to differentiate an open for read and an
open for execution - both look like read to the server. This allowed
users to read files that didn't have READ access but did have EXEC access,
which is obviously wrong.
This patch adds an ACCESS call to the OPEN compound to handle the
difference between OPENs for reading and execution. Since we're going
through the trouble of calling ACCESS, we check all possible access bits
and cache the results hopefully avoiding an ACCESS call in the future.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller. We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
|
|
This will allocate memory that has already been zeroed, allowing us to
remove the memset later on.
Signed-off-by: Bryan Schumaker <bjchuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
I put the client into an open recovery loop by:
Client: Open file
read half
Server: Expire client (echo 0 > /sys/kernel/debug/nfsd/forget_clients)
Client: Drop vm cache (echo 3 > /proc/sys/vm/drop_caches)
finish reading file
This causes a loop because the client never updates the nfs4_state after
discovering that the delegation is invalid. This means it will keep
trying to read using the bad delegation rather than attempting to re-open
the file.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If we are reading through a delegation, and the delegation is OK then
state->stateid will still point to a delegation stateid and not an open
stateid.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When a confirmed client expires, we normally also need to expire any
stable storage record which would allow that client to reclaim state on
the next boot. We forgot to do this in some cases. (For example, in
destroy_clientid, and in the cases in exchange_id and create_session
that destroy and existing confirmed client.)
But in most other cases, there's really no harm to calling
nfsd4_client_record_remove(), because it is a no-op in the case the
client doesn't have an existing
The single exception is destroying a client on shutdown, when we want to
keep the stable storage records so we can recognize which clients will
be allowed to reclaim when we come back up.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Both nfsd4_init_conn and alloc_init_session are probing the callback
channel, harmless but pointless.
Also, nfsd4_init_conn should probably be probing in the "unknown" case
as well. In fact I don't see any harm to just doing it unconditionally
when we get a new backchannel connection.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Before we had to delay expiring a client till we'd found out whether the
session and connection allocations would succeed. That's no longer
necessary.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This will allow some further simplification.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Do the initialization in the caller, and clarify that the only failure
ever possible here was due to allocation.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
It'll be useful to have connection allocation and initialization as
separate functions.
Also, note we'd been ignoring the alloc_conn error return in
bind_conn_to_session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This could simplify the logic a little later.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Something like creating a client with setclientid and then trying to
confirm it with create_session may not crash the server, but I'm not
completely positive of that, and in any case it's obviously bad client
behavior.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
And remove some mostly obsolete comments.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
I added cr_flavor to the data compared in same_creds without any
justification, in d5497fc693a446ce9100fcf4117c3f795ddfd0d2 "nfsd4: move
rq_flavor into svc_cred".
Recent client changes then started making
mount -osec=krb5 server:/export /mnt/
echo "hello" >/mnt/TMP
umount /mnt/
mount -osec=krb5i server:/export /mnt/
echo "hello" >/mnt/TMP
to fail due to a clid_inuse on the second open.
Mounting sequentially like this with different flavors probably isn't
that common outside artificial tests. Also, the real bug here may be
that the server isn't just destroying the former clientid in this case
(because it isn't good enough at recognizing when the old state is
gone). But it prompted some discussion and a look back at the spec, and
I think the check was probably wrong. Fix and document.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Prepare first set of updates for 3.7 merge window.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"There are two main patches in this set, both related to the userland
dlm_controld daemon.
The first fixes a deadlock between dlm_controld and the dlm_send
workqueue when both access configfs data simultaneously.
The second reworks some code to get around a long standing, but
intentional, unlock balance warning. The userland daemon no longer
takes a lock that is later released from the kernel.
The other commits are minor fixes and changes."
* tag 'dlm-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: check the maximum size of a request from user
dlm: cleanup send_to_sock routine
dlm: convert add_sock routine return value type to void
dlm: remove redundant variable assignments
dlm: fix unlock balance warnings
dlm: fix uninitialized spinlock
dlm: fix deadlock between dlm_send and dlm_controld
|
|
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return
NULL. Passing NULL to memcmp() triggers oops.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
A recent change to /sbin/mountall causes any trailing '/' character
in the "device" (or fs_spec) field in /etc/fstab to be stripped. As
a result, an entry for a ceph mount that intends to mount the root
of the name space ends up with now path portion, and the ceph mount
option processing code rejects this.
That is, an entry in /etc/fstab like:
cephserver:port:/ /mnt ceph defaults 0 0
provides to the ceph code just "cephserver:port:" as the "device,"
and that gets rejected.
Although this is a bug in /sbin/mountall, we can have the ceph mount
code support an empty/nonexistent path, interpreting it to mean the
root of the name space.
RFC 5952 offers recommendations for how to express IPv6 addresses,
and recommends the usage found in RFC 3986 (which specifies the
format for URI's) for representing both IPv4 and IPv6 addresses that
include port numbers. (See in particular the definition of
"authority" found in the Appendix of RFC 3986.)
According to those standards, no host specification will ever
contain a '/' character. As a result, it is sufficient to scan a
provided "device" from an /etc/fstab entry for the first '/'
character, and if it's found, treat that as the beginning of the
path. If no '/' character is present, we can treat the entire
string as the monitor host specification(s), and assume the path
to be the root of the name space. We'll still require a ':' to
separate the host portion from the (possibly empty) path portion.
This means that we can more formally define how ceph will interpret
the "device" it's provided when processing a mount request:
"device" will look like:
<server_spec>[,<server_spec>...]:[<path>]
where
<server_spec> is <ip>[:<port>]
<path> is optional, but if present must begin with '/'
This addresses http://tracker.newdream.net/issues/2919
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
|
|
Pull TTY changes from Greg Kroah-Hartman:
"As we skipped the merge window for 3.6-rc1 for the tty tree,
everything is now settled down and working properly, so we are ready
for 3.7-rc1. Here's the patchset, it's big, but the large changes are
removing a firmware file and adding a staging tty driver (it depended
on the tty core changes, so it's going through this tree instead of
the staging tree.)
All of these patches have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up more-or-less trivial conflicts in
- drivers/char/pcmcia/synclink_cs.c:
tty NULL dereference fix vs tty_port_cts_enabled() helper function
- drivers/staging/{Kconfig,Makefile}:
add-add conflict (dgrp driver added close to other staging drivers)
- drivers/staging/ipack/devices/ipoctal.c:
"split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"
* tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
tty/serial: Add kgdb_nmi driver
tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
tty/serial/amba-pl011: Implement poll_init callback
tty/serial/core: Introduce poll_init callback
kdb: Turn KGDB_KDB=n stubs into static inlines
kdb: Implement disable_nmi command
kernel/debug: Mask KGDB NMI upon entry
serial: pl011: handle corruption at high clock speeds
serial: sccnxp: Make 'default' choice in switch last
serial: sccnxp: Remove mask termios caps for SW flow control
serial: sccnxp: Report actual baudrate back to core
serial: samsung: Add poll_get_char & poll_put_char
Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
Powerpc 8xx CPM_UART maxidl should not depend on fifo size
Powerpc 8xx CPM_UART too many interrupts
Powerpc 8xx CPM_UART desynchronisation
serial: set correct baud_base for EXSYS EX-41092 Dual 16950
serial: omap: fix the reciever line error case
8250: blacklist Winbond CIR port
8250_pnp: do pnp probe before legacy probe
...
|
|
This reverts commit 0885ef5b5601e9b007c383e77c172769b1f214fd
After applying the above patch, the performance slowed down because the dirty
page flush can only be done by one task, so revert it.
The following is the test result of sysbench:
Before After
24MB/s 39MB/s
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
|
Everybody is just making stuff up, and it's just used to see if we really do
need to alloc a chunk, and since we do this when we already know we really
do it's just a waste of space. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
So we have lots of places where we try to preallocate chunks in order to
make sure we have enough space as we make our allocations. This has
historically meant that we're constantly tweaking when we should allocate a
new chunk, and historically we have gotten this horribly wrong so we way
over allocate either metadata or data. To try and keep this from happening
we are going to make it so that the block group item insertion is done out
of band at the end of a transaction. This will allow us to create chunks
even if we are trying to make an allocation for the extent tree. With this
patch my enospc tests run faster (didn't expect this) and more efficiently
use the disk space (this is what I wanted). Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
For immutable bio vecs, I've been auditing and removing bi_idx
references. These were harmless, but removing them will make auditing
easier.
scrub_bio_end_io_worker() was open coding a bio_reset() - but this
doesn't appear to have been needed for anything as right after it does a
bio_put(), and perusing the code it doesn't appear anything else was
holding a reference to the bio.
The other use end_bio_extent_readpage() was just for a pr_debug() -
changed it to something that might be a bit more useful.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Stefan Behrens <sbehrens@giantdisaster.de>
|
|
When we wrote some data by compress mode into a btrfs filesystem which was full
of the fragments, the kernel will report:
BTRFS warning (device xxx): Aborting unused transaction.
The reason is:
We can not find a long enough free space to store the compressed data because
of the fragmentary free space, and the compressed data can not be splited,
so the kernel outputed the above message.
In fact, btrfs can deal with this problem very well: it fall back to
uncompressed IO, split the uncompressed data into small ones, and then
store them into to the fragmentary free space. So we shouldn't output the
above warning message.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
|
Wade Cline reported a problem where he was getting garbage and warnings when
writing to a preallocated range via O_DIRECT. This is because we weren't
creating our normal pinned extent_map for the range we were writing to,
which was causing all sorts of issues. This patch fixes the problem and
makes his testcase much happier. Thanks,
Reported-by: Wade Cline <clinew@linux.vnet.ibm.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
Sage reported the following lockdep backtrace
=====================================
[ BUG: bad unlock balance detected! ]
3.6.0-rc2-ceph-00171-gc7ed62d #1 Not tainted
-------------------------------------
btrfs-cleaner/7607 is trying to release lock (sb_internal) at:
[<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
but there are no more locks to release!
other info that might help us debug this:
1 lock held by btrfs-cleaner/7607:
#0: (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b405>] cleaner_kthread+0x95/0x120 [btrfs]
stack backtrace:
Pid: 7607, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00171-gc7ed62d #1
Call Trace:
[<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
[<ffffffff810afa9e>] print_unlock_inbalance_bug+0xfe/0x110
[<ffffffff810b289e>] lock_release_non_nested+0x1ee/0x310
[<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160
[<ffffffffa004106c>] ? put_transaction+0x8c/0x130 [btrfs]
[<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
[<ffffffff810b2a95>] lock_release+0xd5/0x220
[<ffffffff81173071>] ? kmem_cache_free+0x151/0x160
[<ffffffff8117d9ed>] __sb_end_write+0x7d/0x90
[<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
[<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60
[<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffffa0042758>] __btrfs_end_transaction+0x368/0x3c0 [btrfs]
[<ffffffffa0042808>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs]
[<ffffffffa00318f0>] btrfs_drop_snapshot+0x410/0x600 [btrfs]
[<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
[<ffffffffa00430ef>] btrfs_clean_old_snapshots+0xaf/0x150 [btrfs]
[<ffffffffa003b405>] ? cleaner_kthread+0x95/0x120 [btrfs]
[<ffffffffa003b419>] cleaner_kthread+0xa9/0x120 [btrfs]
[<ffffffffa003b370>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
[<ffffffff810791ee>] kthread+0xae/0xc0
[<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
[<ffffffff81635430>] ? retint_restore_args+0x13/0x13
[<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
[<ffffffff8163e740>] ? gs_change+0x13/0x13
This is because the throttle stuff can commit the transaction, which expects to
be the one stopping the intwrite stuff, but we've already done it in the
__btrfs_end_transaction. Moving the sb_end_intewrite after this logic makes the
lockdep go away. Thanks,
Tested-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
This is the change of the kernel side.
Translation of logical to inode used to have an upper limit 4k on
inode container's size, but the limit is not large enough for a data
with a great many of refs, so when resolving logical address,
we can end up with
"ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"
This changes to regard 64k as the upper limit and use vmalloc instead of
kmalloc to get memory more easily.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
|
We already have a helper, iterate_inodes_from_logical(), for logical resolve,
so just use it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
|
In logical resolve, we parse extent_from_logical()'s 'ret' as a kind of flag.
It is possible to lose our errors because
(-EXXXX & BTRFS_EXTENT_FLAG_TREE_BLOCK) is true.
I'm not sure if it is on purpose, it just looks too hacky if it is.
I'd rather use a real flag and a 'ret' to catch errors.
Acked-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Liu Bo <liub.liubo@gmail.com>
|
|
As ref cache has been removed from btrfs, there is no user on
its lock and its check.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
|
When we delete a inode, we will remove all the delayed items including delayed
inode update, and then truncate all the relative metadata. If there is lots of
metadata, we will end the current transaction, and start a new transaction to
truncate the left metadata. In this way, we will leave a inode item that its
link counter is > 0, and also may leave some directory index items in fs/file tree
after the current transaction ends. In other words, the metadata in this fs/file tree
is inconsistent. If we create a snapshot for this tree now, we will find a inode with
corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
because its link counter is not 0.
We fix this problem by updating the inode item before the current transaction ends.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
|
Usecase:
watch 'grep btrfs < /proc/slabinfo'
easy to watch all caches in one go.
Signed-off-by: David Sterba <dsterba@suse.cz>
|
|
I noticed I was seeing large lags when running my torrent test in a vm on my
laptop. While trying to make it lag less I noticed that our overcommit math
was taking into account the number of bytes we wanted to reclaim, not the
number of bytes we actually wanted to allocate, which means we wouldn't
overcommit as often. This patch fixes the overcommit math and makes
shrink_delalloc() use that logic so that it will stop looping faster. We
still have pretty high spikes of latency, but the test now takes 3 minutes
less time (about 5% faster). Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
Mitch reported a problem where you could get an ENOSPC error when untarring
a kernel git tree onto a 16gb file system with compress-force=zlib. This is
because compression is a huge pain, it will return from ->writepages()
without having actually created any ordered extents. To get around this we
check to see if the async submit counter is up, and if it is wait until it
drops to 0 before doing our normal ordered wait dance. With this patch I
can now untar a kernel git tree onto a 16gb file system without getting
ENOSPC errors. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:
We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.
Original-Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
|
ulist_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
|
|
btrfs_iget() never return NULL.
So, NULL check is unnecessary.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
|
|
When we ran fsstress(a program in xfstests), the filesystem hung up when it
is full. It was because the space reserved in btrfs_fallocate() was wrong,
btrfs_fallocate() just used the size of the pre-allocation to reserve the
space, didn't took the block size aligning into account, so the size of
the reserved space was less than the allocated space, it caused the over
reserve problem and made the filesystem hung up when invoking cow_file_range().
Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
|
Though we dump the stack information when aborting a unused transaction
handle, we don't know the correct place where we decide to abort the
transaction handle if one function has several place where the transaction
abort function is invoked and jumps to the same place after this call.
And beside that we also don't know the reason why we jump to abort
the current handle. So I modify the transaction abort function and make
it output the function name, line and error information.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
|
We forget to protect ->log_batch when syncing a file, this patch fix
this problem by atomic operation. And ->log_batch is used to check
if there are parallel sync operations or not, so it is unnecessary to
reset it to 0 after the sync operation of the current log tree complete.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|