Re: 2.6.20.1: reproducible hard lockup (with some configurations)

From: Corey Hickey
Date: Sat Mar 17 2007 - 04:06:37 EST


Corey Hickey wrote:
> Hello,
>
> I am experiencing a hard lockup with 2.6.20.1. Whenever the system locks
> up, it locks up hard: nothing is printed to the console and the magic
> SysRQ key has no effect--the only thing I can do is poke the reset
> button. I have reasonable faith in the stability of my hardware: I can
> run memtest86+ for hours without problems; likewise with burnK7,
> mencoder, and various other programs that stress the CPU. I've never had
> this problem (or any similar one) with 2.6.19 and earlier.
>
> The problem originally manifested whenever I initiated a RAID-5 resync.
> I reported the problem to linux-raid, but Neil Brown wasn't able to
> reproduce it and he suggested I was having trouble with a lower-level
> driver. I've messed around for many hours with many different kernel
> configurations, but all I've been able to find out is that, with some
> configurations, the RAID resync doesn't immediately cause a lockup, but
> a lockup happens later (sometimes hours later) nonetheless. Since the
> late lockup isn't as easily reproducible, I'll concentrate the rest of
> this report on conditions that lead to immediate lockup.
>
> When the lockup is triggered by a resync, it is very easy to reproduce:
> 1. Boot with 'init=/bin/bash'.
> 2. Run 'mdadm -A /dev/md2 -U resync'.
> 3. Wait about 1 second. The system will lock up.
>
> System information:
> Athlon64 3400+
> 64-bit Linux 2.6.20.1 compiled with GCC 4.1.2
> 64-bit Debian Sid
> RAID-5 of 5 devices:
> /dev/hda (IDE hard drive)
> /dev/sda6 (partition on SATA hard drive)
> /dev/sdb (SATA hard drive)
> /dev/sdc6 (partition on SATA hard drive)
> /dev/sdd (SATA hard drive)
> SATA and IDE drives mounted to onboard nVidia controllers
> I'm using the libata SATA driver and the old IDE driver
>
> My full kernel .config is here:
> http://fatooh.org/files/tmp/config-2.6.20.1
> ...and the output of 'lspci -v' is here:
> http://fatooh.org/files/tmp/lspci-v
>
> If anybody has any suggestions, I would be very grateful. I'd also be
> happy to run further tests or provide any other information that may be
> useful.

Just in case this is helpful for somebody:

After I sent the last message, I kept trying to find a solution;
eventually I tried compiling without CONFIG_CC_OPTIMIZE_FOR_SIZE, and
that seems to have fixed the problem. Uptime is 12 days so far, without
any issues.

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/