Re: 2.6.7 SMP trouble?

From: Richard B. Johnson
Date: Mon Jul 19 2004 - 15:07:23 EST


On Mon, 19 Jul 2004, Jason Gauthier wrote:

> I've found an IBM netfinity (5600) box that was shelved a few years ago. I
> spent $80 and got two processors for it. (P3-667).
>
> I put them in the box, installed Linux (slackware) and upgraded the kernel
> to 2.6.7. I then started installing my software on it. Nagios, MRTG,
> samba, and some other tools we use for network monitoring. This is going to
> be an upgrade to a monitoring server we have. Well, I went home, came in
> the next day and the box was locked hard. No messages, no console output.
> Just dead.
>
> Thinking it was a fluke, I fired it up. Again, after several hours running;
> total death. So, I figured I have two options. Software or hardware is
> making it die. I removed each processor in turn, and ran the box for over
> 24 hours under HIGH stress. (5+ load average). The system is running the
> above mentioned software. But, just to make sure this processor gets a
> workout I am compiling code over and over. Both processors have been rock
> solid for the duration of the test.
>
> I then placed both processors in the box and started the same test. It was
> dead within 8 hours. I am now very suspicious of the kernel.
>
> So, I installed 2.4.22 and ran the same tests. It went over 48 hours with
> no issues. Now I'm certain it's the kernel. Can anyone confirm any SMP
> issues that might cause this?
>
> Thanks,
>
> Jason

Another data-point. I haven't been able to run any new (2.6+) kernel
reliably in a SMP machine. They stop. Just like you noted. That's
why all my SMP machines still run 2.4.26. It's rock solid and has
the latest-and-greatest updates (there's a -pre-27 coming out).
Anyway, for production machines, you probably need to run 2.4.26.

If you don't really need anything reliable, you might try to enable
Sys Req and see if you can find out where it's stopped. When my
machines stop, the CPUs get cold, just like their clocks were
shut off! -- another data-point --


Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips).
Note 96.31% of all statistics are fiction.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/