2.6.20.1: reproducible hard lockup (with some configurations)

From: Corey Hickey
Date: Sat Mar 03 2007 - 22:01:06 EST


Hello,

I am experiencing a hard lockup with 2.6.20.1. Whenever the system locks
up, it locks up hard: nothing is printed to the console and the magic
SysRQ key has no effect--the only thing I can do is poke the reset
button. I have reasonable faith in the stability of my hardware: I can
run memtest86+ for hours without problems; likewise with burnK7,
mencoder, and various other programs that stress the CPU. I've never had
this problem (or any similar one) with 2.6.19 and earlier.

The problem originally manifested whenever I initiated a RAID-5 resync.
I reported the problem to linux-raid, but Neil Brown wasn't able to
reproduce it and he suggested I was having trouble with a lower-level
driver. I've messed around for many hours with many different kernel
configurations, but all I've been able to find out is that, with some
configurations, the RAID resync doesn't immediately cause a lockup, but
a lockup happens later (sometimes hours later) nonetheless. Since the
late lockup isn't as easily reproducible, I'll concentrate the rest of
this report on conditions that lead to immediate lockup.

When the lockup is triggered by a resync, it is very easy to reproduce:
1. Boot with 'init=/bin/bash'.
2. Run 'mdadm -A /dev/md2 -U resync'.
3. Wait about 1 second. The system will lock up.

System information:
Athlon64 3400+
64-bit Linux 2.6.20.1 compiled with GCC 4.1.2
64-bit Debian Sid
RAID-5 of 5 devices:
/dev/hda (IDE hard drive)
/dev/sda6 (partition on SATA hard drive)
/dev/sdb (SATA hard drive)
/dev/sdc6 (partition on SATA hard drive)
/dev/sdd (SATA hard drive)
SATA and IDE drives mounted to onboard nVidia controllers
I'm using the libata SATA driver and the old IDE driver

My full kernel .config is here:
http://fatooh.org/files/tmp/config-2.6.20.1
...and the output of 'lspci -v' is here:
http://fatooh.org/files/tmp/lspci-v

If anybody has any suggestions, I would be very grateful. I'd also be
happy to run further tests or provide any other information that may be
useful.

Thank you,
Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/