Re: Strange read data corruption on ext4/LVM/md

From: Pierre Ossman
Date: Thu May 20 2010 - 06:22:39 EST


On Thu, 20 May 2010 11:42:29 +0200
Tejun Heo <tj@xxxxxxxxxx> wrote:

> > randomly flipped bits? I don't know if you saw the first couple of
> > mails (before linux-ide was added), but the problem is data being moved
> > around, not just randomly changed.
>
> I ony saw your previous posting. TLP corruption can happen during
> command setup phase and bit flipping in the command address part is
> definitely possible, so reads and writes can be headed at wrong places
> in both memory and disk. I don't know whether this would fit your
> symptom tho.
>

Ah. Here's the problem description from a previous mail:

The corruption is 104 bytes. Somewhat odd number. I would have expected
something more fundamental like a sector or a page.

The data in question seems to come from another part of the file.

The shifts are 015d1380 => 015d0f80 (-1024 bytes) and 02210380 =>
0220ff80 (also -1024 bytes). At least the offset is a nice, sane power
of two number.

Noteworthy is also that the last three nibbles of the corruption are
always the same (xxxxx380 => xxxxxf80).

</recap>

Note that the above analysis is from files, so it involves the entire
stack. I've since focused on raw disks. See below.

> > Another note is that the problem seems to worsen under load. I'm
> > running the dd thing in the background, which seems to make read errors
> > more common on my test files on the filesystem level.
>
> It would be great if you can try a different controller in similar
> setup.

I only stock sil3132 cards as those are the only decent add-on cards
I've found. AHCI stuff all seems to be onboard.

> But please keep trying to narrow down the problem and if
> possible please remove filesystem from the stack and test against the
> block device directly.

That's what I've been doing the last couple of runs. From a previous
mail:

I did some more testing though, and this might be a low level issue. I
did the following multiple times:

# dd if=/dev/sde skip=4k bs=4M count=500 | md5sum

And the results were:

13aa29adcd16f8d0faf3cb5c39f43826
d1e3df33c0b0d03c61f880a8f2bb6cfb
13aa29adcd16f8d0faf3cb5c39f43826
13aa29adcd16f8d0faf3cb5c39f43826
13aa29adcd16f8d0faf3cb5c39f43826
13aa29adcd16f8d0faf3cb5c39f43826
7a746328b60a63b76847c3e1319a8534
13aa29adcd16f8d0faf3cb5c39f43826

</recap2>

Since the amount of data is much larger here and the incidents more
rare, I haven't been able to confirm that the corruption is identical
to what I've seen in the files. I'm working on the assumption that it
is...

I've since constructed a script that keeps re-running the above over
all relevant disks and keeps track of how many unique md5 values we
get. It's been running for about 1.5 hours right now, and here are the
results so far:

sdd - 3, sde - 4, sdf - 1, sdb - 1, sdc - 1,

sdd and sde are both on the same controller, so the problem you
mentioned could be relevant.

I'll let the test run for a few more hours and try moving things off
that controller later tonight.


Thanks for looking at this. Unstable data storage is one of those
things that can keep you up at night. :/

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by FRA, a
Swedish intelligence agency. Make sure your server uses
encryption for SMTP traffic and consider using PGP for
end-to-end encryption.

Attachment: signature.asc
Description: PGP signature