2.4.31 hangs, no information on console or serial port

From: David Golombek
Date: Tue Feb 21 2006 - 10:21:55 EST


I have a box running a modified Debian/woody system and 2.4.31. It is
intermittently hanging such that:

* All logging to /var/log ceases.
* Machine is still pingable.
* Machine can be telneted to on time port, but no time is echoed.
* After attaching a console+keyboard, console would not unblank.
* Nothing responded when attaching a serial console.
* Machine does not respond to Ctrl-Alt-Del
* No DMI messages are logged.
* Hang is persistent until physical reboot.

This has happened 4 times, on 2 separate machines (under roughly
similar conditions). Machines are up variable amounts of time before
crashing, between many weeks and less than 1 day. Nothing unusual is
logged in /var/log/{deamon.log,kern.log,messages,syslog} prior the
hang, except that /var/log/messages includes the "TCP: Treason
uncloaked!" warnings that are fixed in 2.4.32. No users were logged
on at the time of 3 of the 4 crashes, and no local user activity was
present at the time of the 4th.

The machines are Intel P4's with 2GB of memory

The machine is under relatively high load and has a custom userspace
nfs server running on it (which is potentially to blame, but we've
been unable to determine how). The custom userspace nfs server and
tomcat4 are the primary applications running.

Any suggestions as to how we might debug this or possible causes would
be greatly appreciated.

Thanks,
Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/