md/raid5: deadlock between retry_aligned_read with barrier io

A chunk aligned read increases counter active_aligned_reads and decreases it after sub-device handle it successfully. But when a read error occurs, the read redispatched by raid5d, and the active_aligned_reads will not be decreased until we can grab a stripe head in retry_aligned_read. Now suppose, a barrier io comes, set conf->quiesce to 2, and wait until both active_stripes and active_aligned_reads are zero. The retried chunk aligned read gets stuck at get_active_stripe waiting until conf->quiesce becomes 0. Retry_aligned_read and barrier io are waiting each other now. One possible solution is that we ignore conf->quiesce, let the retried aligned read finish. I reproduced this deadlock and test this patch on centos6.0 Signed-off-by: NeilBrown <neilb@suse.de>
author: hui jiao <simonjiaoh@gmail.com> 2014-06-05 11:34:24 +0800
committer: NeilBrown <neilb@suse.de> 2014-06-05 17:18:19 +1000
commit: 2844dc32ea67044b345221067207ce67ffe8da76 (patch)
tree: 4f235e9eacd5a86110b70c39fa21673bb2916946 /drivers/md
parent: d592a9969141e67a3874c808999a4db4bf82ed83 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d69fd9888c2..ce421e3a398 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5115,7 +5115,7 @@ static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
 			/* already done this stripe */
 			continue;
 
-		sh = get_active_stripe(conf, sector, 0, 1, 0);
+		sh = get_active_stripe(conf, sector, 0, 1, 1);
 
 		if (!sh) {
 			/* failed to get a stripe - must wait */
author	hui jiao <simonjiaoh@gmail.com>	2014-06-05 11:34:24 +0800
committer	NeilBrown <neilb@suse.de>	2014-06-05 17:18:19 +1000
commit	2844dc32ea67044b345221067207ce67ffe8da76 (patch)
tree	4f235e9eacd5a86110b70c39fa21673bb2916946 /drivers/md
parent	d592a9969141e67a3874c808999a4db4bf82ed83 (diff)