I posted previously about having problems with random reboots on nfsroot
nodes across kernels 2.2.18 - 2.4.6 (all kernels exhibit the same
problem - after X amount of time, where x is usually < 24 hours, the
system just reboots).
When I run the systems with uniprocessor kernels, the problem does not
occur.
When the smp kernel is booted with noapic, the apic errors go away. Other
posts I read about smp apic problems seemed to indicate that they received
hundreds of messages in a short period of time - I was getting maybe seven
or eight over the course of several hours.
I can not locate any references on the net to others having trouble with
SMP in asus cur_dls boards or with the ServerWorks chipset.
Is it possible that there is some interaction between smp and nfsroot and
cur_dls that is causing the problem (all of my other cur_dls boards are
using a local disk)? I've tried wrapping my head around the the nfs code
to search for smp specific problems, and while I understand a lot more of
it now than I did before, it is still mostly beyond my immediate
comprehension.
Is it possible that this is a power/cpu voltage problem? If so, would a
ups be a solution?
Is is possible that the whole batch of 10 motherboards
is broken somehow (we have oodles of other asus cur_dls smp systems that
don't have problems, just this cluster)?
Are there any suggestions as to further troubleshooting options?
I am working on booting with a tftp downloaded ramdisk as the root, to
eliminate nfsroot from the equation, but I am skeptical as to whether this
will actually help anything.
regards,
-ryan
-- Ryan Sweet <ryan.sweet@atosorigin.com> Atos Origin Engineering Services http://www.aoes.nl- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Mon Jul 23 2001 - 21:00:12 EST