Re: asm-generic: Disallow no-op mb() for SMP systems

From: Stafford Horne
Date: Fri Feb 02 2018 - 08:48:57 EST


On Thu, Feb 01, 2018 at 02:29:09PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 01, 2018 at 09:27:50PM +0900, Stafford Horne wrote:
> > I tried to clarify some of this in the spec v1.2 [0] which help formalize some of
> > the techniques we used for the SMP implementation. Its probably not perfect,
> > but I added a section "10. Multicore support" and tried to clarify some things
> > in section 7 on Atomicity. But it seems I dont cover exactly what are are
> > mentioning here. In general:
> >
> > 1 Secondary cores have memory snooping enabled meaning that any write to a
> > cached address will cause the cache line to be invalidated.
> > 2 l.swa (store atomic word) implies a store buffer flush.
>
> What about l.lwa? Can that observe 'old' values, or rather, miss values
> stuck in a remote store buffer?
>
> This will then cause the first l.swa to fail, which, per the above,
> would then sync things up? Which means you get that one extra
> merry-go-round.

Sorry, I remembered incorrectly, l.lwa also implies a (l.msync) store buffer
flush for the local cpu. However, in order to see something stuch in the remote
store buffer a flush would need to be inititiated on the remote core. I think
that is what we would expect though right?

> > 3 l.msync is used to flush the store buffer
> >
> > Also, during the IPI controller review [1] Marc Z asked many similar questions.
> > I believe he was ok in the end.
> >
> > Anyway,
> > Thanks for thanks for spotting the issue here. For some reason I remember we
> > did have an l.msync for our mb(). Let me think about and test out this patch
> > (and the fix to actually define mb) to see if anything comes up.
> >
> > Also, I haven't seen any implementations that use WOM. Stefan might know better.
>
> So if the strong model has a store buffer, as I think the above says,
> then it is _NOT_ correct for l.msync to be treated as a NOP, it _must_
> flush the store buffer.
>
> At which point I think your 'strong' model is basically TSO. So it would
> be very good to get that spelled out somewhere.

Yes, I think the original author did not think of PSO/TSO and store buffers.
Its not clear of the authors intention. It should be cleared up.

I would say:
1 Weak order model with store buffers is PSO (must implement l.msync)
2 Strong model with store buffers is TSO (must implement l.msync)
3 Implementations without store buffers could be weak or strong?
a weak meaning cpu could schedule loads stores out of order l.msync would
cause all pending load/store instructions to be retired.
b strong meaning loads/stores would happen in instruction order, in this
case l.msync could be a no-op as there is no buffering of stores or
loads.

1 doesnt exist as far as I know. So its probably better to remove.
2 is what we have now in mor1kx.
3.b it possible, but we always have a l.msync implementation. But maybe it
doesnt make sense when there is no store buffer.

-Stafford