Re: PROBLEM: md/raid5 bug by commit 415e72d03

From: NeilBrown
Date: Thu Jul 14 2011 - 02:52:46 EST


On Fri, 8 Jul 2011 13:58:39 +0800 Qin Dehua <qindehua@xxxxxxxxx> wrote:

> By bisecting, commit 415e72d03(md/raid5: Allow recovered part of
> partially recovered devices to be in-sync) was found as the cause of
> follow problem:
> * md1_raid5 or md1_raid6 process will hang(99% CPU usage) after
> repeatedly remove disk from then re-add disk to raid5 or raid6(there
> is also a dd process write to the raid continuously).
>
> Hardware platform is IOP 341 XScale processor(did not test on other
> platforms). The problem can be reproduced with this script:
>
> { while true; do dd if=/dev/zero of=/dev/md1 bs=1M count=90000 >
> /dev/null 2>/dev/null;done; } &
>
> while true;do
> mdadm /dev/md1 -f /dev/sda /dev/sdb
> sleep 1
> mdadm /dev/md1 -r /dev/sda /dev/sdb
> sleep 1
> mdadm /dev/md1 -a /dev/sda /dev/sdb
> sleep 6
> done

I cannot reproduce this on x86_64.

Can you find out more about the hanging process? Maybe
cat /proc/PROCESS-ID/stack

a few times and see where it is spending its time??

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/