Re: File system cache corruption in 2.6?

From: Jens Axboe
Date: Mon Jan 05 2004 - 07:21:16 EST


On Mon, Jan 05 2004, Nathaniel W. Filardo wrote:
> Hi all,
> I'm trying to work out the cause of a series of issues I've seen
> on my 2.6 machine. It appears as though files (specifically libraries) in
> memory can get corrupted, resulting in strangeness like segfaults and
> things like "relocation error: can't find symbol ...-VOMD-POINTER" instead
> of "...-VOID-POINTER".

That's a single bit error.

> I don't believe it's actual hardware failure for a few reasons: memtest86
> passes all tests, GCC doesn't crash (it's a Gentoo system, so gcc and I
> are well acquainted - and before I get jumped on, I've installed udev ;)
> ), and most importantly, sometimes thrashing the file system or engaging a
> kernel compile will rectify the situation, as just happened with emacs.
> It crashed, I killed it, it wouldn't load - I started a kernel compile,
> waited a bit, and lo', it works again. No messages of relevance appear in
> dmesg.

It looks _extremely_ much like bad memory, or bad hardware. Sometimes
memtest just doesn't catch all errors (how long did you run it? needs
several days often).

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/