Re: Silent corruption on AMD64

From: Andrew Morton
Date: Sat Mar 31 2007 - 22:53:13 EST


> On Sat, 31 Mar 2007 18:27:36 -0700 Aaron Lehmann <aaronl@xxxxxxxxxxx> wrote:
> I have spent a lot of time trying to find a simpler test case. So far,
> as far as I can tell, there are three conditions that must be
> satisfied for corruption to occur:
>
> 1. Heavy Ethernet load (nc remotehost < /dev/zero)
> 2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path)
> 3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null)
>
> With these conditions satisfied, data read off sda or sdb (the drives
> associated with sata_sil) is often corrupted. Since I can only see
> this problem with files on those two drives, I'm inclined to suspect
> the sata_sil driver, but I really have no idea what's going on. I know
> this is not a recent issue - I experienced very similar corruption at
> least a year ago. I wasn't able to reproduce it at the time, because
> it only appeared in the backups I was restoring from.

Are you able to provide us with some before-and-after data so we
can see this corruption.

See, if it's dropped-bits or shifted-data or eight-byte-aligned
kernel addresses or whatever, that helps us generate theories..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/