Age | Commit message (Collapse) | Author |
|
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:
- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).
No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This will be used for rbd snapshots administration.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
|
|
Allow the messenger to send/receive data in a bio. This is added
so that we wouldn't need to copy the data into pages or some other buffer
when doing IO for an rbd block device.
We can now have trailing variable sized data for osd
ops. Also osd ops encoding is more modular.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The osd requests creation are being decoupled from the
vino parameter, allowing clients using the osd to use
other arbitrary object names that are not necessarily
vino based. Also, calc_raw_layout now takes a snap id.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
OSD requests need to be resubmitted on any pg mapping change, not just when
the pg primary changes. Resending only when the primary changes results in
occasional 'hung' requests during osd cluster recovery or rebalancing.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The ->writepages writeback_control is not still valid in the writepages
completion. We were touching it solely to adjust pages_skipped when there
was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials),
causing an oops in the writeback code shortly thereafter. Updating
pages_skipped on error isn't correct anyway, so let's just rip out this
(clearly broken) code to pass the wbc to the completion.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Make variable name slightly more generic, since it will (soon)
reflect either the time the request was sent OR the time it was
last determined to be still retrying.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This simplifies the process of timing out messages. We
keep lru of current messages that are in flight. If a
timeout has passed, we reset the osd connection, so that
messages will be retransmitted. This is a failsafe in case
we hit some sort of problem sending out message to the OSD.
Normally, we'll get notification via an updated osdmap if
there are problems.
If a request is older than the keepalive timeout, send a
keepalive to ensure we detect any breaks in the TCP connection.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Use a single ceph_msg for the osd reply, even when we are getting multiple
replies.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
|
|
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
When we issue an OSD read, we specify a vector of pages that the data is to
be read into. The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.
Only read data into the page vector if the reply is coming from the
OSD we last sent the request to. Keep track of which connection is using
the vector by taking a reference. If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
|
|
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
When we open a monitor session, we send an initial AUTH message listing
the auth protocols we support, our entity name, and (possibly) a previously
assigned global_id. The monitor chooses a protocol and responds with an
initial message.
Initially implement AUTH_NONE, a dummy protocol that provides no security,
but works within the new framework. It generates 'authorizers' that are
used when connecting to (mds, osd) services that simply state our entity
name and global_id.
This is a wire protocol change.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The OSD client is responsible for reading and writing data from/to the
object storage pool. This includes determining where objects are
stored in the cluster, and ensuring that requests are retried or
redirected in the event of a node failure or data migration.
If an OSD does not respond before a timeout expires, keepalive
messages are sent across the lossless, ordered communications channel
to ensure that any break in the TCP is discovered. If the session
does reset, a reconnection is attempted and affected requests are
resent (by the message transport layer).
Signed-off-by: Sage Weil <sage@newdream.net>
|