summaryrefslogtreecommitdiffstats
path: root/drivers/md
AgeCommit message (Collapse)Author
2012-03-19md/raid10 - support resizing some RAID10 arrays.NeilBrown
'resizing' an array in this context means making use of extra space that has become available in component devices, not adding new devices. It also includes shrinking the array to take up less space of component devices. This is not supported for array with a 'far' layout. However for 'near' and 'offset' layout arrays, adding and removing space at the end of the devices is easy to support, and this patch provides that support. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/raid1: handle merge_bvec_fn in member devices.NeilBrown
Currently we don't honour merge_bvec_fn in member devices so if there is one, we force all requests to be single-page at most. This is not ideal. So create a raid1 merge_bvec_fn to check that function in children as well. This introduces a small problem. There is no locking around calls the ->merge_bvec_fn and subsequent calls to ->make_request. So a device added between these could end up getting a request which violates its merge_bvec_fn. Currently the best we can do is synchronize_sched(). This will work providing no preemption happens. If there is is preemption, we just have to hope that new devices are largely consistent with old devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/raid10: handle merge_bvec_fn in member devices.NeilBrown
Currently we don't honour merge_bvec_fn in member devices so if there is one, we force all requests to be single-page at most. This is not ideal. So enhance the raid10 merge_bvec_fn to check that function in children as well. This introduces a small problem. There is no locking around calls the ->merge_bvec_fn and subsequent calls to ->make_request. So a device added between these could end up getting a request which violates its merge_bvec_fn. Currently the best we can do is synchronize_sched(). This will work providing no preemption happens. If there is preemption, we just have to hope that new devices are largely consistent with old devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md: add proper merge_bvec handling to RAID0 and Linear.NeilBrown
These personalities currently set a max request size of one page when any member device has a merge_bvec_fn because they don't bother to call that function. This causes extra works in splitting and combining requests. So make the extra effort to call the merge_bvec_fn when it exists so that we end up with larger requests out the bottom. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md: tidy up rdev_for_each usage.NeilBrown
md.h has an 'rdev_for_each()' macro for iterating the rdevs in an mddev. However it uses the 'safe' version of list_for_each_entry, and so requires the extra variable, but doesn't include 'safe' in the name, which is useful documentation. Consequently some places use this safe version without needing it, and many use an explicity list_for_each entry. So: - rename rdev_for_each to rdev_for_each_safe - create a new rdev_for_each which uses the plain list_for_each_entry, - use the 'safe' version only where needed, and convert all other list_for_each_entry calls to use rdev_for_each. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/raid1,raid10: avoid deadlock during resync/recovery.NeilBrown
If RAID1 or RAID10 is used under LVM or some other stacking block device, it is possible to enter a deadlock during resync or recovery. This can happen if the upper level block device creates two requests to the RAID1 or RAID10. The first request gets processed, blocks recovery and queue requests for underlying requests in current->bio_list. A resync request then starts which will wait for those requests and block new IO. But then the second request to the RAID1/10 will be attempted and it cannot progress until the resync request completes, which cannot progress until the underlying device requests complete, which are on a queue behind that second request. So allow that second request to proceed even though there is a resync request about to start. This is suitable for any -stable kernel. Cc: stable@vger.kernel.org Reported-by: Ray Morris <support@bettercgi.com> Tested-by: Ray Morris <support@bettercgi.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/bitmap: ensure to load bitmap when creating via sysfs.NeilBrown
When commit 69e51b449d383e (md/bitmap: separate out loading a bitmap...) created bitmap_load, it missed calling it after bitmap_create when a bitmap is created through the sysfs interface. So if a bitmap is added this way, we don't allocate memory properly and can crash. This is suitable for any -stable release since 2.6.35. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md: don't set md arrays to readonly on shutdown.NeilBrown
It seems that with recent kernel, writeback can still be happening while shutdown is happening, and consequently data can be written after the md reboot notifier switches all arrays to read-only. This causes a BUG. So don't switch them to read-only - just mark them clean and set 'safemode' to '2' which mean that immediately after any write the array will be switch back to 'clean'. This could result in the shutdown happening when array is marked dirty, thus forcing a resync on reboot. However if you reboot without performing a "sync" first, you get to keep both halves. This is suitable for any stable kernel (though there might be some conflicts with obvious fixes in earlier kernels). Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md: allow re-add to failed arrays.NeilBrown
When an array is failed (some data inaccessible) then there is no point attempting to add a spare as it could not possibly be recovered. However that may be value in re-adding a recently removed device. e.g. if there is a write-intent-bitmap and it is clear, then access to the data could be restored by this action. So don't reject a re-add to a failed array for RAID10 and RAID5 (the only arrays types that check for a failed array). Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-13md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read().majianpeng
Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-13md/raid5: removed unused 'added_devices' variable.NeilBrown
commit 908f4fbd265733 removed the last user of this variable, so we should discard it completely. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-13md/raid10: remove unnecessary smp_mb() from end_sync_writeNeilBrown
Recent commit 4ca40c2ce099e4f1ce3 (md/raid10: Allow replacement device ...) added an smp_mb in end_sync_write. This was to close a possible race with raid10_remove_disk. However there is no such race as it is never attempted to remove a disk while resync (or recovery) is happening. so the smp_mb is just noise. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-13md/raid5: make sure reshape_position is cleared on error path.NeilBrown
Leaving a valid reshape_position value in place could be confusing. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-08Merge tag 'dm-3.3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper fixes for 3.3 from Alasdair Kergon Eight small device-mapper bug fixes. * tag 'dm-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm raid: fix flush support dm raid: set MD_CHANGE_DEVS when rebuilding dm thin metadata: decrement counter after removing mapped block dm thin metadata: unlock superblock in init_pmd error path dm thin metadata: remove incorrect close_device on creation error paths dm flakey: fix crash on read when corrupt_bio_byte not set dm io: fix discard support dm ioctl: do not leak argv if target message only contains whitespace
2012-03-07dm raid: fix flush supportJonathan E Brassow
Fix dm-raid flush support. Both md and dm have support for flush, but the dm-raid target forgot to set the flag to indicate that flushes should be passed on. (Important for data integrity e.g. with writeback cache enabled.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm raid: set MD_CHANGE_DEVS when rebuildingJonathan E Brassow
The 'rebuild' parameter is used to rebuild individual devices in an array (e.g. resynchronize a RAID1 device or recalculate a parity device in higher RAID). The MD_CHANGE_DEVS flag must be set when this parameter is given in order to write out the superblocks and make the change take immediate effect. The code that handles new devices in super_load already sets MD_CHANGE_DEVS and 'FirstUse'. (The 'FirstUse' flag was being set as a special case for rebuilds in super_init_validation.) Add a condition for rebuilds in super_load to take care of both flags without the special case in 'super_init_validation'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm thin metadata: decrement counter after removing mapped blockJoe Thornber
Correct the number of mapped sectors shown on a thin device's status line by decrementing td->mapped_blocks in __remove() each time a block is removed. Signed-off-by: Joe Thornber <ejt@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm thin metadata: unlock superblock in init_pmd error pathJoe Thornber
If dm_sm_disk_create() fails the superblock must be unlocked. Signed-off-by: Joe Thornber <ejt@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm thin metadata: remove incorrect close_device on creation error pathsMike Snitzer
The __open_device() error paths in __create_thin() and __create_snap() incorrectly call __close_device() even if td was not initialized by __open_device(). Remove this. Also document __open_device() return values, remove a redundant td->changed = 1 in __create_thin(), and insert an additional safeguard against creating an already-existing device. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm flakey: fix crash on read when corrupt_bio_byte not setMike Snitzer
The following BUG is hit on the first read that is submitted to a dm flakey test device while the device is "down" if the corrupt_bio_byte feature wasn't requested when the device's table was loaded. Example DM table that will hit this BUG: 0 2097152 flakey 8:0 2048 0 30 This bug was introduced by commit a3998799fb4df0b0af8271a7d50c4269032397aa (dm flakey: add corrupt_bio_byte feature) in v3.1-rc1. BUG: unable to handle kernel paging request at ffff8801cfce3fff IP: [<ffffffffa008c233>] corrupt_bio_data+0x6e/0xae [dm_flakey] PGD 1606063 PUD 0 Oops: 0002 [#1] SMP ... Call Trace: <IRQ> [<ffffffffa008c2b5>] flakey_end_io+0x42/0x48 [dm_flakey] [<ffffffffa00dca98>] clone_endio+0x54/0xb6 [dm_mod] [<ffffffff81130587>] bio_endio+0x2d/0x2f [<ffffffff811c819a>] req_bio_endio+0x96/0x9f [<ffffffff811c94b9>] blk_update_request+0x1dc/0x3a9 [<ffffffff812f5ee2>] ? rcu_read_unlock+0x21/0x23 [<ffffffff811c96a6>] blk_update_bidi_request+0x20/0x6e [<ffffffff811c9713>] blk_end_bidi_request+0x1f/0x5d [<ffffffff811c978d>] blk_end_request+0x10/0x12 [<ffffffff8128f450>] scsi_io_completion+0x1e5/0x4b1 [<ffffffff812882a9>] scsi_finish_command+0xec/0xf5 [<ffffffff8128f830>] scsi_softirq_done+0xff/0x108 [<ffffffff811ce284>] blk_done_softirq+0x84/0x98 [<ffffffff81048d19>] __do_softirq+0xe3/0x1d5 [<ffffffff8138f83f>] ? _raw_spin_lock+0x62/0x69 [<ffffffff810997cf>] ? handle_irq_event+0x4c/0x61 [<ffffffff8139833c>] call_softirq+0x1c/0x30 [<ffffffff81003b37>] do_softirq+0x4b/0xa3 [<ffffffff81048a39>] irq_exit+0x53/0xca [<ffffffff81398acd>] do_IRQ+0x9d/0xb4 [<ffffffff81390333>] common_interrupt+0x73/0x73 ... Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.1+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm io: fix discard supportMilan Broz
This patch fixes a crash by recognising discards in dm_io. Currently dm_mirror can send REQ_DISCARD bios if running over a discard-enabled device and without support in dm_io the system crashes badly. BUG: unable to handle kernel paging request at 00800000 IP: __bio_add_page.part.17+0xf5/0x1e0 ... bio_add_page+0x56/0x70 dispatch_io+0x1cf/0x240 [dm_mod] ? km_get_page+0x50/0x50 [dm_mod] ? vm_next_page+0x20/0x20 [dm_mod] ? mirror_flush+0x130/0x130 [dm_mirror] dm_io+0xdc/0x2b0 [dm_mod] ... Introduced in 2.6.38-rc1 by commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard). Signed-off-by: Milan Broz <mbroz@redhat.com> Cc: stable@kernel.org Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-07dm ioctl: do not leak argv if target message only contains whitespaceJesper Juhl
If 'argc' is zero we jump to the 'out:' label, but this leaks the (unused) memory that 'dm_split_args()' allocated for 'argv' if the string being split consisted entirely of whitespace. Jump to the 'out_argv:' label instead to free up that memory. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-05Merge tag 'md-3.3-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from Neil Brown: "Three fixes for md in 3.3-rc: Two relate to the recently added drive replacement. One fixes the problem where a read error in RAID10 would sometimes be retried indefinitely." * tag 'md-3.3-fixes' of git://neil.brown.name/md: md/raid10: fix assembling of arrays with replacement devices. md/raid10: fix handling of error on last working device in array. md/raid1: fix buglet in md_raid1_contested.
2012-03-06md/raid10: fix assembling of arrays with replacement devices.NeilBrown
commit 56a2559bb654a (md/raid10: recognise replacements ...) changed 'run' to set ->replacement or ->rdev depending on the 'Replacement' status if the device, but it didn't remove the old unconditional setting of 'rdev'. So it was largely ineffective. So remove that now. Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-14md/raid10: fix handling of error on last working device in array.NeilBrown
If we get a read error on the last working device in a RAID10 which contains the target block, then we don't fail the device (which is good) but we don't abort retries, which is wrong. We end up in an infinite loop retrying the read on the one device. This patch fixes the problem in two places: 1/ in raid10_end_read_request we don't even ask for a retry if this was the last usable device. This is efficient but a little racy and will sometimes retry when it should not. 2/ in handle_read_error we are careful to exclude any device from retry which we tried to mark as faulty (that might have failed if it was the last device). This is race-free but less efficient. Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-13md/raid1: fix buglet in md_raid1_contested.NeilBrown
Since we added 'replacement' capability, RAID1 can have twice as many devices as ->raid_disks indicates. So md_raid1_congested needs to check that many possible devices, not just ->raid_disks many. Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-08Merge tag 'md-3.3-fixes' of git://neil.brown.name/mdLinus Torvalds
Some simple md-related fixes. 1/ two small fixes to ensure we handle an interrupted resync properly. 2/ avoid loading the bitmap multiple times in dm-raid * tag 'md-3.3-fixes' of git://neil.brown.name/md: md: two small fixes to handling interrupt resync. Prevent DM RAID from loading bitmap twice.
2012-02-07md: two small fixes to handling interrupt resync.NeilBrown
1/ If a resync is aborted we should record how far we got (recovery_cp) the last request that we know has completed (->curr_resync_completed) rather than the last request that was submitted (->curr_resync). 2/ When a resync aborts we still want to update the metadata with any changes, so set MD_CHANGE_DEVS even if we 'skip'. Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-31Prevent DM RAID from loading bitmap twice.Jonathan Brassow
The life cycle of a device-mapper target is: 1) create 2) resume 3) suspend *) possibly repeat from 2 4) destroy The dm-raid target is unconditionally calling MD's bitmap_load function upon every resume. If steps 2 & 3 above are repeated, bitmap_load is called multiple times. It is only written to be called once; otherwise, it allocates new memory for the bitmap (without freeing the old) and incrementing the number of pages it thinks it has without zeroing first. This ultimately leads to access beyond allocated memory and lost memory. Simply avoiding the bitmap_load call upon resume is not sufficient. If the target was suspended while the initial recovery was only partially complete, it needs to be restarted when the target is resumed. This is why 'md_wakeup_thread' is called before issuing the 'mddev_resume'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-15Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits) Revert "block: recursive merge requests" block: Stop using macro stubs for the bio data integrity calls blockdev: convert some macros to static inlines fs: remove unneeded plug in mpage_readpages() block: Add BLKROTATIONAL ioctl block: Introduce blk_set_stacking_limits function block: remove WARN_ON_ONCE() in exit_io_context() block: an exiting task should be allowed to create io_context block: ioc_cgroup_changed() needs to be exported block: recursive merge requests block, cfq: fix empty queue crash caused by request merge block, cfq: move icq creation and rq->elv.icq association to block core block, cfq: restructure io_cq creation path for io_context interface cleanup block, cfq: move io_cq exit/release to blk-ioc.c block, cfq: move icq cache management to block core block, cfq: move io_cq lookup to blk-ioc.c block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq block, cfq: reorganize cfq_io_context into generic and cfq specific parts block: remove elevator_queue->ops block: reorder elevator switch sequence ... Fix up conflicts in: - block/blk-cgroup.c Switch from can_attach_task to can_attach - block/cfq-iosched.c conflict with now removed cic index changes (we now use q->id instead)
2012-01-14dm: do not forward ioctls from logical volumes to the underlying devicePaolo Bonzini
A logical volume can map to just part of underlying physical volume. In this case, it must be treated like a partition. Based on a patch from Alasdair G Kergon. Cc: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11Merge tag 'md-3.3-fixes' of git://neil.brown.name/mdLinus Torvalds
Two bugfixes for md. One is a recently introduced regression that affects an unusual configuration with a guaranteed BUG_ON. Has been tagged for -stable. The other is minor missing functionality. * tag 'md-3.3-fixes' of git://neil.brown.name/md: md/raid1: perform bad-block tests for WriteMostly devices too. md: notify the 'degraded' sysfs attribute on failure.
2012-01-11block: Introduce blk_set_stacking_limits functionMartin K. Petersen
Stacking driver queue limits are typically bounded exclusively by the capabilities of the low level devices, not by the stacking driver itself. This patch introduces blk_set_stacking_limits() which has more liberal metrics than the default queue limits function. This allows us to inherit topology parameters from bottom devices without manually tweaking the default limits in each driver prior to calling the stacking function. Since there is now a clear distinction between stacking and low-level devices, blk_set_default_limits() has been modified to carry the more conservative values that we used to manually set in blk_queue_make_request(). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-01-11md/raid1: perform bad-block tests for WriteMostly devices too.NeilBrown
We normally try to avoid reading from write-mostly devices, but when we do we really have to check for bad blocks and be sure not to try reading them. With the current code, best_good_sectors might not get set and that causes zero-length read requests to be send down which is very confusing. This bug was introduced in commit d2eb35acfdccbe2 and so the patch is suitable for 3.1.x and 3.2.x Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org
2012-01-11md: notify the 'degraded' sysfs attribute on failure.NeilBrown
We currently only 'notify' changes to the 'degraded' attribute when it decreases, not when it increases. Notifying on failure is a little awkward as it happen in interrupt context. So instead, notify when we remove the failed device from the array, which is very soon afterwards. Reported-and-tested-by: Mikhail Balabin <mbalabin@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-08Merge tag 'md-3.3' of git://neil.brown.name/mdLinus Torvalds
md update for 3.3 Big change is new hot-replacement. A slot in an array can hold 2 devices - one that wants-replacement and one that is the replacement. Once the replacement is built - either from the original or (in the case of errors) from elsewhere, the wants-replacement device will be removed. * tag 'md-3.3' of git://neil.brown.name/md: (36 commits) md/raid1: Mark device want_replacement when we see a write error. md/raid1: If there is a spare and a want_replacement device, start replacement. md/raid1: recognise replacements when assembling arrays. md/raid1: handle activation of replacement device when recovery completes. md/raid1: Allow a failed replacement device to be removed. md/raid1: Allocate spare to store replacement devices and their bios. md/raid1: Replace use of mddev->raid_disks with conf->raid_disks. md/raid10: If there is a spare and a want_replacement device, start replacement. md/raid10: recognise replacements when assembling array. md/raid10: Allow replacement device to be replace old drive. md/raid10: handle recovery of replacement devices. md/raid10: Handle replacement devices during resync. md/raid10: writes should get directed to replacement as well as original. md/raid10: allow removal of failed replacement devices. md/raid10: preferentially read from replacement device if possible. md/raid10: change read_balance to return an rdev md/raid10: prepare data structures for handling replacement. md/raid5: Mark device want_replacement when we see a write error. md/raid5: If there is a spare and a want_replacement device, start replacement. md/raid5: recognise replacements when assembling array. ...
2012-01-03fs: move code out of buffer.cAl Viro
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-23md/raid1: Mark device want_replacement when we see a write error.NeilBrown
Now that WantReplacement drives are replaced cleanly, mark a drive as want_replacement when we see a write error. It might get failed soon so the WantReplacement flag is irrelevant, but if the write error is recorded in the bad block log, we still want to activate any spare that might be available. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: If there is a spare and a want_replacement device, start replacement.NeilBrown
When attempting to add a spare to a RAID1 array, also consider adding it as a replacement for a want_replacement device. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: recognise replacements when assembling arrays.NeilBrown
If a Replacement is seen, file it as such. If we see two replacements (or two normal devices) for the one slot, abort. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: handle activation of replacement device when recovery completes.NeilBrown
When recovery completes ->spare_active is called. This checks if the replacement is ready and if so it fails the original. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Allow a failed replacement device to be removed.NeilBrown
Replacement devices are stored at a different offset, so look there too. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Allocate spare to store replacement devices and their bios.NeilBrown
In RAID1, a replacement is much like a normal device, so we just double the size of the relevant arrays and look at all possible devices for reads and writes. This means that the array looks like it is now double the size in some way - we need to be careful about that. In particular, we checking if the array is still degraded while creating a recovery request we need to only consider the first 'half' - i.e. the real (non-replacement) devices. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Replace use of mddev->raid_disks with conf->raid_disks.NeilBrown
In general mddev->raid_disks can change unexpectedly while conf->raid_disks will only change in a very controlled way. So change some uses of one to the other. The use of mddev->raid_disks will not cause actually problems but this way is more consistent and safer in the long term. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: If there is a spare and a want_replacement device, start replacement.NeilBrown
When attempting to add a spare to a RAID10 array, also consider adding it as a replacement for a want_replacement device. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: recognise replacements when assembling array.NeilBrown
If a Replacement is seen, file it as such. If we see two replacements (or two normal devices) for the one slot, abort. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: Allow replacement device to be replace old drive.NeilBrown
When recovery finish and spare_active is called, check for a replace that might have just become fully synced and mark it as such, marking the original as failed. Then when the original is removed, move the replacement into its position. This means that 'replacement' and spontaneously become NULL in some situations. Make sure we check for those. It also means that 'rdev' and 'replacement' could appear to be identical - check for that too. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: handle recovery of replacement devices.NeilBrown
If there is a replacement device, then recover to it, reading from any drives - maybe the one being replaced, maybe not. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: Handle replacement devices during resync.NeilBrown
If we need to resync an array which has replacement devices, we always write any block checked to every replacement. If the resync was bitmap-based resync we will then complete the replacement normally. If it was a full resync, we mark the replacements as fully recovered when the resync finishes so no further recovery is needed. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid10: writes should get directed to replacement as well as original.NeilBrown
When writing, we need to submit two writes, one to the original, and one to the replacements - if there is a replacement. If the write to the replacement results in a write error we just fail the device. We only try to record write errors to the original. This only handles writing new data. Writing for resync/recovery will come later. Signed-off-by: NeilBrown <neilb@suse.de>