Re: Oops with 2.4.23

From: Disconnect
Date: Mon Dec 22 2003 - 13:38:16 EST


On Mon, 2003-12-22 at 12:05, Chris Frey wrote:
> At what point do people start suspecting the kernel?

Immediately. Maybe. My problems seem to have been two-fold ('seem to'
because it could take up to a week to hit..). First, the memory timings
listed by Kingston for this ram were way too aggressive for the ram+mobo
combo. (FWIW, for anyone tracking this from the other thread, the epox
"if your motherboard sucks, use these" timings worked much better, and I
left cas at 2 instead of the epox-suggested 2.5. Oh, and memtest86
reported an -increase- in memory bandwidth with those settings.) And
second, the kernel didn't have erratta fixes and workarounds that MS
operating systems use to get io-apic running and stable, acpi stable,
etc...

> I mean, I would hope the linux kernel is not so badly written as to stress
> the machine 24/7. So after 12 hours of running memtest86 with clean
> results, does that not begin to point to a software error rather than
> hardware?

Depends on the tests. See my earlier results (and, if you want some
light reading, the memtest86 technical docs on the website) as an
example of why its important to run the full tests before fully
'certifying' a problem as software. (Even though mine passed standard
tests, it failed extended. And, for that matter, it passed both when I
first built it, before I put an OS on it.)

Evidently many people can look at an oops or crash path and be 90%
certain its bad/misperforming memory. (How they do this I don't
know..) I've noticed that 'check your ram' is not always given as the
answer, but it seems that most of the time when it is given its also
correct.

--
Disconnect <lkml@xxxxxxxxxxx>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/