diff options
author | Lars Ellenberg <lars.ellenberg@linbit.com> | 2010-10-16 12:13:47 +0200 |
---|---|---|
committer | Philipp Reisner <philipp.reisner@linbit.com> | 2010-10-22 15:46:11 +0200 |
commit | 82f59cc6353889b426cf13b6596d5a3d100fa09e (patch) | |
tree | 6d5a678516334f0a37a56a509b84322a0352719b /drivers/block/drbd/drbd_nl.c | |
parent | 3beec1d446fba335f07787636920892dd3b2c658 (diff) |
drbd: fix potential deadlock on detach
If we have contention in drbd_al_begin_iod (heavy randon IO),
an administrative request to detach the disk may deadlock
for similar reasons as the recently fixed deadlock if detaching
because of IO-error.
The approach taken here is to either go through the intermediate
cleanup state D_FAILED, or first lock out application io,
don't just go directly to D_DISKLESS.
We need an additional state bit (WAS_IO_ERROR) to distinguish
the -> D_FAILED because of IO-error from other failures.
Sanitize D_ATTACHING -> D_FAILED to D_ATTACHING -> D_DISKLESS.
If only attaching, ldev may be missing still, but would be referenced
from within the after_state_ch for -> D_FAILED, potentially
dereferencing a NULL pointer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Diffstat (limited to 'drivers/block/drbd/drbd_nl.c')
-rw-r--r-- | drivers/block/drbd/drbd_nl.c | 16 |
1 files changed, 15 insertions, 1 deletions
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index c498c4827de..0cba7d3d2b5 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -870,6 +870,11 @@ static int drbd_nl_disk_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp retcode = ERR_DISK_CONFIGURED; goto fail; } + /* It may just now have detached because of IO error. Make sure + * drbd_ldev_destroy is done already, we may end up here very fast, + * e.g. if someone calls attach from the on-io-error handler, + * to realize a "hot spare" feature (not that I'd recommend that) */ + wait_event(mdev->misc_wait, !atomic_read(&mdev->local_cnt)); /* allocation not in the IO path, cqueue thread context */ nbc = kzalloc(sizeof(struct drbd_backing_dev), GFP_KERNEL); @@ -1262,7 +1267,7 @@ static int drbd_nl_disk_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp force_diskless_dec: put_ldev(mdev); force_diskless: - drbd_force_state(mdev, NS(disk, D_DISKLESS)); + drbd_force_state(mdev, NS(disk, D_FAILED)); drbd_md_sync(mdev); release_bdev2_fail: if (nbc) @@ -1285,10 +1290,19 @@ static int drbd_nl_disk_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp return 0; } +/* Detaching the disk is a process in multiple stages. First we need to lock + * out application IO, in-flight IO, IO stuck in drbd_al_begin_io. + * Then we transition to D_DISKLESS, and wait for put_ldev() to return all + * internal references as well. + * Only then we have finally detached. */ static int drbd_nl_detach(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp, struct drbd_nl_cfg_reply *reply) { + drbd_suspend_io(mdev); /* so no-one is stuck in drbd_al_begin_io */ reply->ret_code = drbd_request_state(mdev, NS(disk, D_DISKLESS)); + if (mdev->state.disk == D_DISKLESS) + wait_event(mdev->misc_wait, !atomic_read(&mdev->local_cnt)); + drbd_resume_io(mdev); return 0; } |