I thought I had a hardware problem with my news server; well, I know I
DID have a hardware problem because I was getting "ethernet memory error"'s.
But in addition, strange SCSI errors, kernel "scsi command queable" kernel
panics, etc.
I got a new machine altogether; this one says it's a SS-10BSX when you
boot it, the old one said, SS-10SX, not sure what the difference between those
is.
Anyway, with new machine assuming I can get it booted, running 2.2.13,
after an hour or so it dies with some sort of disk problem. Usually was device
0 not responding (on internal controller).
When I rebooted trying to fsck, sometimes device 2, sometimes device 3
would fail. It would first force it async then it would die altogether.
The strange thing is that if I dd the drives to /dev/null, ZERO errors,
it will sit and do that all day fine. If I boot it with a single CPU kernel
I don't get these errors, only with an SMP kernel.
A few days ago I posted a spin_lock_irq hang on another SS-10 here; there
is a pattern with it is well. It is our web server and it serves files that
are on another machine that it gets via NFS. When I try to write backups, it
hangs with the spin_lock_irq thing before I can ever finish a set of backups.
I have to boot a single CPU kernel on this to do backups.
There is definitely something messed with the Sparc SCSI disk I/O under
2.2.13 on SMP machines.
I've got SunOS 4.1.4 running on same hardware, with 50-100 users online at
any given time and it's been up 47 days without a crash. Linux beats SunOS
hands down in terms of effeciency and scalability, but when it comes to
stability on SMP it loses.
Maybe M$ will port Windows NT to Sparc..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/