Skip to content

Commit

Permalink
md/raid1,raid10: always abort recover on write error.
Browse files Browse the repository at this point in the history
Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui <jiaohui@bwstor.com.cn>
Reported-and-tested-by: jiao hui <jiaohui@bwstor.com.cn>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org
  • Loading branch information
NeilBrown committed Jul 31, 2014
1 parent 64aa90f commit 2446dba
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 10 deletions.
8 changes: 4 additions & 4 deletions drivers/md/raid1.c
Original file line number Diff line number Diff line change
Expand Up @@ -1501,12 +1501,12 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
mddev->degraded++;
set_bit(Faulty, &rdev->flags);
spin_unlock_irqrestore(&conf->device_lock, flags);
/*
* if recovery is running, make sure it aborts.
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
} else
set_bit(Faulty, &rdev->flags);
/*
* if recovery is running, make sure it aborts.
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(MD_CHANGE_DEVS, &mddev->flags);
printk(KERN_ALERT
"md/raid1:%s: Disk failure on %s, disabling device.\n"
Expand Down
11 changes: 5 additions & 6 deletions drivers/md/raid10.c
Original file line number Diff line number Diff line change
Expand Up @@ -1684,13 +1684,12 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
spin_unlock_irqrestore(&conf->device_lock, flags);
return;
}
if (test_and_clear_bit(In_sync, &rdev->flags)) {
if (test_and_clear_bit(In_sync, &rdev->flags))
mddev->degraded++;
/*
* if recovery is running, make sure it aborts.
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
}
/*
* If recovery is running, make sure it aborts.
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(Blocked, &rdev->flags);
set_bit(Faulty, &rdev->flags);
set_bit(MD_CHANGE_DEVS, &mddev->flags);
Expand Down

0 comments on commit 2446dba

Please sign in to comment.