Re: 2.6.23.1: mdadm/raid5 hung/d-state

From: Dan Williams
Date: Mon Nov 05 2007 - 19:20:06 EST


On 11/5/07, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
[..]
> > Are you seeing the same "md thread takes 100% of the CPU" that Joël is
> > reporting?
> >
>
> Yes, in another e-mail I posted the top output with md3_raid5 at 100%.
>

This seems too similar to Joël's situation for them not to be
correlated, and it shows that iscsi is not a necessary component of
the failure.

The attached patch allows the debug statements in MD to be enabled via
sysfs. Joël, since it is easier for you to reproduce can you capture
the kernel log output after the raid thread goes into the spin? It
will help if you have CONFIG_PRINTK_TIME=y set in your kernel
configuration.

After the failure run:

echo 1 > /sys/block/md_d0/md/debug_print_enable; sleep 5; echo 0 >
/sys/block/md_d0/md/debug_print_enable

...to enable the print messages for a few seconds. Please send the
output in a private message if it proves too big for the mailing list.

Attachment: raid5-debug-print-enable.patch
Description: Binary data