Re: Strange disk corruption with Linux >= 2.6.13

From: Nigel Cunningham
Date: Sun Oct 02 2005 - 23:56:48 EST


Hi.

On Sun, 2005-10-02 at 07:36, Rogério Brito wrote:
> On Sep 28 2005, Nigel Cunningham wrote:
> > Hi Rogerio.
>
> Hi, Nigel.
>
> > On Tue, 2005-09-27 at 21:10, Rogério Brito wrote:
> > > Hi there. I'm seeing a really strange problem on my system lately and I
> > > am not really sure that it has anything to do with the kernels.
> >
> > I've seen the thread mostly following the hardware line. I'd like to
> > enquire down the kernel path because I've seen occasional, impossible
> > to reproduce problems too.
>
> Nice. I also don't want to rule out anything before I really understand
> what's going on.
>
> > Can I ask first a few questions:
>
> Of course.
>
> > 1) Are you using vanilla kernels, or do you have other patches applied?
>
> Yes, all the kernels that I use are just plain vanilla kernels taken
> straight from kernel.org. No other patches applied.

Ok. That's helpful.

> > 2) Are you using ext3 only?
>
> Yes, I am.
>
> > 3) Is the corruption only ever in memory, or seen on disk too?
>
> I have noticed the problem mostly on disk. One strange situation was
> when I was untarring a kernel tree (compressed with bzip2) and in the
> middle of the extraction, bzip2 complained that the thing was
> corrupted.
>
> I removed what was extracted right away and tried again to extract the
> tree (at this point, suspecting even that something in software had
> problems). The problem with bzip2 occurred again. Then, I rebooted the
> system an the problem magically went away.

If you see it in a form where you can see the amount of corruption, can
you see if it is just four bytes?

I'm asking because I have recently started seeing
impossible-to-reliably-reproduce corruption here, which seems to be only
four bytes at a time, in memory originally but possibly also appearing
on disk (probably because of syncing). I originally wondered if it might
be Suspend2 related (in the first instance, assume I messed up :)), but
I haven't been sure. The corruption I'm seeing only affects the root
filesystem. None of this makes much sense if I assume it's a Suspend2
bug. I could have a bad pointer access somewhere, but the rest is just
confusing.

Regards,

Nigel

> > 4) Is the corruption only in one filesystem or spread across several
> > (if applicable)? (ie in / but not /home or others?)
>
> I only have one filesystem right now, but given the difficulties that
> I'm seeing, I do plan to go back to a multiple filesystem setup (which I
> always used but thought that was overkill---nothing like time to teach
> us something what is safest).
>
> If you want to know anything else, don't hesistate to ask.
>
>
> Regards,
--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/