Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

From: Andi Kleen
Date: Wed Jan 17 2007 - 17:16:55 EST



> We've just verified that configuring the graphics aperture to be
> write-combining instead of write-back using an MTRR also solves the
> problem. It appears to be a cache incoherency issue in the graphics
> aperture.

Interesting.

Unfortunately it is also not correct. It was intentional to
mark the IOMMU half. of the aperture write-back, as opposed
to uncached as the AGP half. Otherwise you get illegal cache attribute
conflicts with the memory that is being remapped which can also cause
corruption.

The Northbridge guarantees coherency over the aperture, but
only if the caching attributes match.

You would need to change_page_attr() every kernel address that is mapped into
the IOMMU to use an uncached aperture. AGP does this, but the frequency of
mapping for the IOMMU is much higher and it would be prohibitively costly
unfortunately.

In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect you traded a more easy to trigger corruption with
a more subtle one.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/