Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell88f5182)

From: James Bottomley
Date: Thu May 13 2010 - 11:40:02 EST


On Thu, 2010-05-13 at 10:18 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-05-12 at 18:41 -0500, James Bottomley wrote:
> > > Which means that for coherent architectures that do not implement
> > > the ops->sync_* hooks, we are probably missing a barrier here...
> > >
> > > Thus if the above is expected to be a memory barrier, it's broken on
> > > cache coherent powerpc for example. On non-coherent powerpc, we do
> > cache
> > > flushes and those are implicit barriers.
> >
> > Can you explain this a little more. On a cache coherent machine, the
> > sync is a nop ... why would you want a nop to be any type of barrier?
>
> Well if the driver can peek at the data after the sync, and have any
> kind of ordering guarantee that it doesn't get stale data (the load
> isn't prefetched or speculated early), that would require an mb() or at
> least rmb().

So the guarantee that it doesn't look at stale data after the sync on a
cache coherent machine means ordering the dma write to physical memory
with the subsequent cpu read ... no memory barrier can actually do that.
Usually this is done externally, by making sure the memory change is
visible before sending the irq that tells the driver it is there ... on
some numa systems, this can be a problem (hence the mmiowb/relaxed read
thing).

> It would seem sensible for drivers to assume that something like
> dma_cache_sync_for_cpu() thus has the semantics of an rmb() at least,
> no ?

I still don't see why ... I don't see how you'd ever get a read of the
area speculated before the event that tells the driver its OK to read
the memory. In theory, I agree that it looks logical to require the
read never be speculated before the sync ... but in practice, I don't
see there ever being a problem with this since the sync isn't the event
that says the memory is safe to read.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/