summaryrefslogtreecommitdiffstats
path: root/drivers/block/rbd.c
AgeCommit message (Collapse)Author
2012-02-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"
2012-02-02rbd: fix safety of rbd_put_client()Alex Elder
The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A | Thread on CPU B --------------- | --------------- rbd_put_client(CLIENT) | rbd_get_client() kref_put() | (acquires node_lock) kref->refcount becomes 0 | __rbd_client_find() returns CLIENT calls rbd_client_release() | kref_get(&CLIENT->kref); | (releases node_lock) (acquires node_lock) | deletes CLIENT from list | ...and starts using CLIENT... (releases node_lock) | and frees CLIENT | <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-02rbd: fix a memory leak in rbd_get_client()Alex Elder
If an existing rbd client is found to be suitable for use in rbd_get_client(), the rbd_options structure is not being freed as it should. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-12rbd: initialize snap_rwsem in rbd_add()Alex Elder
New rbd device structures get initialized in rbd_add(). Many of the fields rely on being initially zero-filled. However we lockdep was noticing that the rw_semaphore embedded in the header field was not getting properly initialized. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-07rbd: remove buggy rollback functionalityJosh Durgin
This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-12-07rbd: return an error when an invalid header is readJosh Durgin
This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-10-28Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-clientLinus Torvalds
* 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix double-free of page vector ceph: fix 32-bit ino numbers libceph: force resend of osd requests if we skip an osdmap ceph: use kernel DNS resolver ceph: fix ceph_monc_init memory leak ceph: let the set_layout ioctl set single traits Revert "ceph: don't truncate dirty pages in invalidate work thread" ceph: replace leading spaces with tabs libceph: warn on msg allocation failures libceph: don't complain on msgpool alloc failures libceph: always preallocate mon connection libceph: create messenger with client ceph: document ioctls ceph: implement (optional) max read size ceph: rename rsize -> rasize ceph: make readpages fully async
2011-10-25libceph: create messenger with clientSage Weil
This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>
2011-09-15Merge branch 'master' into for-nextJiri Kosina
Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.
2011-09-15treewide: remove extra semicolons from various parts of the kernelJustin P. Mattock
This is a resend from the original, changing the title from PATCH to RFC(since this is a review for commit, and I should have put that the first go around). and also removing some of the commit's with ia64 and bash since it is significant. let me know if I might have missed anything etc.. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-26rbd: set blk_queue request sizes to object sizeJosh Durgin
This improves performance since more requests can be merged. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-07-26rbd: cancel watch request when releasing the deviceYehuda Sadeh
We were missing this cleanup, so when a device was released the osd didn't clean up its watchers list, so following notifications could be slow as osd needed to timeout on the client. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-05-24rbd: handle online resize of underlying rbd imageSage Weil
If we get a notification that the image header has changed, check for a change in the image size. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24rbd: use snprintf for disk->disk_nameSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24rbd: cleanup: make kfree match kmallocSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19rbd: warn on update_snaps failure on notifySage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-13rbd: fix split bio handlingYehuda Sadeh
The rbd driver currently splits bios when they span an object boundary. However, the blk_end_request expects the completions to roll up the results in block device order, and the split rbd/ceph ops can complete in any order. This patch adds a struct rbd_req_coll to track completion of split requests and ensures that the results are passed back up to the block layer in order. This fixes errors where the file system gets completion of a read operation that spans an object boundary before the data has actually arrived. The bug is easily reproduced with iozone with a working set larger than available RAM. Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-12rbd: fix leak of ops structSage Weil
The ops vector must be freed by the rbd_do_request caller. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03libceph: fix ceph_osdc_alloc_request error checksSage Weil
ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22rbd: use watch/notify for changes in rbd headerYehuda Sadeh
Send notifications when we change the rbd header (e.g. create a snapshot) and wait for such notifications. This allows synchronizing the snapshot creation between different rbd clients/rools. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12rbd: fix cleanup when trying to mount inexistent imageYehuda Sadeh
Previously we didn't clean up the sysfs entry that was just created. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01rbd: replace the rbd sysfs interfaceYehuda Sadeh
The new interface creates directories per mapped image and under each it creates a subdir per available snapshot. This allows keeping a cleaner interface within the sysfs guidelines. The ABI documentation was updated too. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20rbd: passing wrong variable to bvec_kunmap_irq()Dan Carpenter
We should be passing "buf" here insead of "bv". This is tricky because it's not the same as kmap() and kunmap(). GCC does warn about it if you compile on i386 with CONFIG_HIGHMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20rbd: null vs ERR_PTRDan Carpenter
ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20block: rbd: removing unnecessary testYehuda Sadeh
rbd_get_segment() can't return a negative value, we don't need to check the return output. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20block: rbd: fixed may leaksVasiliy Kulikov
rbd_client_create() doesn't free rbdc, this leads to many leaks. seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense. Also if fixed check fails then seg_name is leaked. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20rbd: introduce rados block device (rbd), based on libcephYehuda Sadeh
The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>