Re: 2.4.29-pre2 Oops at find_inode/reiserfs_find_actor

From: Willy Tarreau
Date: Tue Dec 21 2004 - 20:02:00 EST


Hi,

On Wed, Dec 22, 2004 at 12:56:05AM +0100, Manfred Schwarb wrote:

> No, not at all. This machine was running for 10 months with 2.4.xx
> kernels without any problems. Since this oops, I tried to reproduce
> the particular situation (rsync over nfs, and some additional load),
> put I had no success in crashing the box:
> I mirrored the box (ca. 1 million files) 5 times, no problem.
(...)
> Hardware issue: you mean memory? Last winter I ran memtest86
> during a weekend, everything was fine. At the moment I can't
> take this box offline for a longer period to test again, so I
> tend to belive memory is ok, and knock on wood...

Does this machine have ECC memory ? If so, have you tried to check the
error counters ? And if not, well, it might be some random bit flips
in memory.

Last year, I encountered a really strange situation. During about one week,
I had 3 or 4 machines that have panic'd several times each, and sometimes
only some multi-threaded apps would freeze (eg: pdnsd). These machines
were installed at different customers' in different locations, and at
different times. They had never crashed before, nor did they after. Most
other similar machines at other customer's, or some at the sames did not
experience this. This was during the first week of november, when there
were the very strong solar flares. I never told the customers about my
thoughts on the subject, they would have definitely made fun of me. So
we replaced the RAM sticks, to report something more believable...

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/