2.2.15 Random Lockups

From: Robert Dinse (nanook@eskimo.com)
Date: Wed May 31 2000 - 11:58:34 EST


     Under 2.2.14 and prior, I had problems with spin_lock dead-lock on Sparc
SS-10 platforms equipped with quad Ross RTK-625 CPUs. Oddly, I did not have
any problem with a 4/670MP with exactly the same CPU's (in fact I took the CPU
modules out of an SS-10 and stuck them in a 4/670MP so they were the SAME
modules). These were the 100mhz version.

     Under 2.2.15 the spin_lock dead-lock issue seems to have dissappeared, but
now I see two new failures modes, and worse, these failure modes appears to
affect single CPU machines as well.

     One failure mode, the machines will hang hard. No response to anything
typed at the console, not even L1-A will work. The only way to get the
machines to recover is to power cycle them. This affects both SS-10's with
multiple Ross Hypersparc CPU's and LX's with a single TI CPU.

     The other failure mode is the machine will stop responding, you can't
telnet/rlogin/ssh in, Apache won't response, etc. But whatever you type at the
console is echoed and you can still switch virtual consoles. However, if you
switch consoles and get a new login prompt and type a password, the password
prompt never appears. You can continue to switch consoles and type things and
see it echo but the machine continues not to respond. Control-Alt-Delete also
does not work, but in this failure mode L1-A does work.

     Neither failure mode produces any console messages, no OOPS, no spin_lock
deadlock errors, NADA, which needless to say doesn't make troubleshooting easy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:28 EST