Re: 2.6.3 userspace freeze

From: Andrew Morton
Date: Tue Mar 09 2004 - 17:52:10 EST


"Anders K. Pedersen" <akp@xxxxxxxxxxxx> wrote:
>
> Last night I upgraded two of our webservers from Linux 2.4 to 2.6.3.
> During the night both of them rebooted spontanously (i.e. no indication
> of why in the log files) several times, so this morning I attached a
> serial console to capture the kernel messages, when they rebooted.
>
> What I found was that all of a sudden my SSH connections to the server
> and the local vtys would freeze, and it would stop responding to TCP
> connections, while still responding to ICMP echo requests. Apparently
> all userspace processes just froze. After approximately 60 seconds, it
> logged "SOFTDOG: Initiating system reboot.", and rebooted. This was the
> only kernel message, except for the boot messages. This happened
> repeatedly on both servers.
>
> Both servers run (mainly) Apache 1.3 and Sun Chili ASP (several hundred
> processes each), and the freezes seemed to happen during high load
> peaks.
>
> I have attached the kernel .config (same on both servers) and the kernel
> boot messages including the software watchdog reboot message. Both
> servers are identical IBM xSeries 345 servers. I have other similar
> servers running 2.6.3 for other purposes without any problems (so far).
>
> Any ideas on what's wrong, or how to find out, would be much
> appreciated.

It could be a kernel deadlock, or a memory leak, or a disk device driver
bug.

Would it be possible to run a `vmstat 1' somewhere and capture the last
thirty or so lines prior to the reboot?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/