Re: [patch] x86: improved memory barrier implementation

From: Nick Piggin
Date: Sat Sep 29 2007 - 09:19:59 EST

On Fri, Sep 28, 2007 at 06:18:31PM +0100, Alan Cox wrote:
> > on the broken ppro stores config option if you just tell me what should
> > be there (again, remember that my patch isn't actually changing anything
> > already there except for smp_rmb side).
> The PPro needs rmb to ensure a store doesn't go for a walk on the wild
> side and pass the read especially when we are dealing with device space.

smp_rmb (ie. the non device space version) needs to order stores too?
OK, that's definitely something we'll need a locked op for. I can't see how
it could possibly close all holes, though: the level at which memory ordering
dependencies operate in lockless code is way above what the hardware can
know about.

For any 2 given code sequences
store 1 -> X
store 2 -> Y
load Y
load X

There may be a requirement to have wmb and rmb respectively for
correctness, or it may be perfectly correct without ordering. Unless the
issue is *purely* that store and/or loads can erroneously execute out
of order, then adding magic to barrier primitives cannot fix all cases
(and the errata certainly looked spookier than simply out of order

> The rest of the stuff is a little vague which makes me nervous when
> considering relaxing the PPro case for smp_rmb. With smp_rmb as it is at
> the moment the PPro is effectively treated as an out of order cpu and so
> shouldn't hit anything occassionally that a PPC wouldn't hit every time.

Well, it's all a little vague to me ;) (maybe you were privy to info that you
aren't allowed to talk about...). IIRC the errata I could find, mentioned
concurrent stores to a single variable as being problematic, and nothing
about memory ordering, or loads, at all.

One difference with a normal weakly ordered machine, is that the normal
one wouldn't require rmb to order loads vs stores, for example.

Anyway, after all that blowing of smoke (I'm just genuinely curious), I have
no problems with any fixups for ppro and defer to your greater understanding
of the issues. However, all that can go under the config variable we have
for that purpose. Let me know what you need? Is it just a matter of putting
the dummy lock'ed op back in smp_rmb? You sure you don't also need
smp_wmb to order stores? I can cook up a patch to do either.

> > > - and for modern processors its still not remotely clear your patch is
> > > correct because of NT stores.
> >
> > No. We already _have_ to rely on this model for barriers
> Well Linus has dealt with the question of NT stores for us now...

He actually didn't, quite ;) (see other post)

> Given this isn't an issue on 64bit I'm inclined to argue that only 64bit
> behaviour should be changed at this point.

That's the easy way out, but I think it creates more problems down the
line. It diverges the code base and testing base. I think it is pretty simple
(and serves as good documentation) to extend the use of the ppro ifdef
to do what we need here, rather than implicitly rely on the barriers being
stronger than required for non-broken implementations.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at