Hi,
Lately I've been updating our smp machine, and alongside built a second
smp machine. The first one, apart from a "stuck on TLB" glitch two months ago
never crashed.
Lately, some changes had been made. One is they now both run Helix-gnome 1.2
with updates and a distributed net client, along with ofcourse the redhat 6.2
updates.
Both machines have become highly unstable when running X on them. But that
could just be a manifestation of the extra load the machines receive when
it runs. I hardly believe Helix binaries are the cause here.
All crashes so far showed no log entries whatsoever. The machine would suddenly
become extremely slow, and in a matter of 3-5 seconds, the mouse would freeze
along with the entire machine. Today, I managed to get a logentry, though
ksymoops can't seem to read it (and I can't read/match the symbols for some
odd reason).
Aug 31 16:28:49 dupla kernel:
Aug 31 16:28:49 dupla kernel: wait_on_bh, CPU 0:
Aug 31 16:28:49 dupla kernel: irq: 1 [0 1]
Aug 31 16:28:49 dupla kernel: bh: 1 [0 1]
Aug 31 16:29:20 dupla kernel: <[c010be9d]> <[c0169cc2]> <[c0169d3d]> <[c017990d]> <[c0151d6f]> <[c013496b]> <[c0134ac7]> stuck on TLB IPI wait (CPU#0)
Aug 31 16:29:20 dupla kernel: stuck on TLB IPI wait (CPU#0)
Aug 31 16:29:20 dupla kernel: stuck on TLB IPI wait (CPU#0)
After three of these, a fourth one happened on CPU#1, then it continued on
CPU#0 again. This time I had managed to switch back to console mode just
before the system froze completely, and managed to use SysRq-r to remount ro
and SysRq-b to boot the machine.
Ksymoops said:
Warning (Oops_read): Code line not seen, dumping what data is available
Trace; c010be9d <synchronize_bh+3d/50>
Trace; c0169cc2 <tcp_listen_poll+12/50>
Trace; c0169d3d <tcp_poll+3d/100>
Trace; c017990d <inet_poll+21/2c>
Trace; c0151d6f <sock_poll+1f/24>
Trace; c013496b <do_poll+7b/dc>
Trace; c0134ac7 <sys_poll+fb/17c>
819 warnings and 1 error issued. Results may not be reliable.
The networkcard is an HP 100VG Anylan (driver hp100.o)
If needed, I can provide access (including root) on the spare dual CPU
machine.
This machine is an Asus P2L97-DS, with two P-II Deschutes, 333Mhz. CPU#0 is
stepping 0, CPU#1 is stepping 2.
As I said, we have two dual CPU systems. The other one has the same symptoms,
but is an Asus P2B-DS with two identical P-III KatMai's on 450Mhz, stepping 7.
But I've never managed to get a log entry on that one. And since it's a
production machine, I'm no longer running X on it [1].
Paul Wouters
Xtended Internet
[1] I felt really awfull running X on the NIS master to begin with :)
-- Broerdijk 27 Postbus 170 Tel: 31-24-360 39 19 6523 GM Nijmegen 6500 AD Nijmegen Fax: 31-24-360 19 99 The Netherlands The Netherlands info@xtdnet.nl- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org
This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:31 EST