Re: mdraid6 problem post 3.5.0

From: NeilBrown
Date: Fri Aug 17 2012 - 18:58:59 EST


On Fri, 17 Aug 2012 18:30:11 -0400 John Drescher <drescherjm@xxxxxxxxx> wrote:

> For the last few weeks I have been doing some reliability testing on a
> mdraid6 array. One of my test was to physically hot remove a raid
> member disk. This worked flawlessly with gentoo-sources-3.5.0 for the
> 5 or so times I tried it with my 12 disk + 1 spare mdraid6 array.
> After pulling a disk a few seconds later the array automatically
> rebuilds with a spare and after finishing all data checks out via
> btrfs a scrub. However trying this with gentoo-sources-3.5.2 or the
> latest kernel.org git sources the machine does not start the rebuild
> and any access to /proc/mdstat or and disk access that is not in cache
> for that array just leads to an a long (possibly infinite) wait
> eventually forcing me to have to use the reset button when the sysrq
> key combinations fail to shut down the machine. I do see some kernel
> debug message in the console alt-ctrl-f12 but I was unable to save
> that to copy.
>
> Is this a known problem? If not it may be possible that I could bisect
> this next week to the patch that causes this behavior.
>

Thanks for the report.

The problem is not known to me.. There are no changes to raid6 between 3.5.0
and 3.5.2, so unless gentoo broke something (unlikely) this is very strange.

A digital-photo of the debug messages might be useful if you can catch that.
Setting up a network console to capture messages isn't too hard if you have
another machine with a wired network connection.
See Documentation/networking/netconsole.txt
If you can set that us, then alt-sysrq-T might provide useful info.


NeilBrown

Attachment: signature.asc
Description: PGP signature