Re: [patch] x86: improved memory barrier implementation

From: Nick Piggin
Date: Sun Sep 30 2007 - 08:16:40 EST

On Sat, Sep 29, 2007 at 09:07:30AM -0700, Linus Torvalds wrote:
> On Sat, 29 Sep 2007, Nick Piggin wrote:
> > >
> > > The non-temporal stores should be basically considered to be "IO", not any
> > > normal memory operation.
> >
> > Maybe you're thinking of uncached / WC? Non-temporal stores to cacheable
> > RAM apparently can go out of order too, and they are being used in the kernel
> > for some things.
> I'm really saying that people to a first approximation should think "NT is
> an IO (DMA) thing". Whether cached or not. Exactly because they do not
> honor the normal memory ordering.
> It may be worth noting that "clflush" falls under that heading too - even
> if all the actual *writes* were done with totally normal writes, if
> anybody does a clflush instruction, that breaks the ordering, and that
> turns it to "DMA ordering" again - ie we're not talking about the normal
> SMP ordering rules at all.
> So all the spinlocks and all the smp_*mb() barriers have never really done
> *anything* for those things (in particular, "smp_wmb()" has *always*
> ignored them on i386!)

Yes that's true. I'm just noting that in some places, we do nontemporal
stores to cacheable RAM, and in that case we just have to remember to be
careful of memory ordering (especially in library functions like
copy_user_nocache where we never know exactly what it will be used for,
even if it seems safe).

Speaking of which, would it be a good idea to put an sfence before and
after this function? It seems like the safe thing to do...

> > Likewise for rep stos, apparently.
> No. As far as I can tell, the fast string operations are unordered
> *within*themselves*, but not wrt the operations around it.
> In other words, you cannot depend on the ordering of stores *in* the
> memcpy() or memset() when it is implemented by "rep movs/stos" - but that
> is 100% equivalent to the fact that you cannot depend on the ordering even
> when it isn't - since the "memcpy()" library routine might be copying
> memory backwards for all you know!
> The Intel memory ordering paper doesn't talk about the fast string
> instructions (except to say that the rules it *does* speak about do not
> hold), but the regular IA manuals do say (for example):
> "Code dependent upon sequential store ordering should not use the
> string operations for the entire data structure to be stored. Data and
> semaphores should be separated. Order dependent code should use a
> discrete semaphore uniquely stored to after any string operations to
> allow correctly ordered data to be seen by all processors."
> and note how it says you should just store to the semaphore. If you think
> about it, that semahore will be involving all the memory ordering
> requirements that we *already* depend on, so if a semaphore is sufficient
> to order the fast string instruction, then by definition using a spinlock
> around them must be the same thing!
> In other words, by Intels architecture manual, fast string instructions
> cannot escape a "semaphore" - but that means that they cannot escape a
> spinlock either (since the two are exactly the same wrt memory ordering
> rules! In other words, whenever the Intel docs say "semaphore", think
> "mutual exclusion lock", not necessarily the kernel kind of "sleeping
> semaphore").
> But it might be good to have that explicitly mentioned in the IA memory
> ordering thing, so I'll ask the Intel people about that. However, I'd say
> that given our *current* documentation, string instructions may be
> *internally* out-of-order, but they would not escape a lock.

OK, it's good to hear that from you :) The one tiny doubt I have left
is that IIRC Intel talk about lock prefixed operations as semaphore
operations sometimes? I don't suppose that would be the case here? Anyway,
if you can get them to put something in writing, that would be grand.

> > But this means they are already at odds with spin_unlock, unless they
> > are enclosed with mfences everywhere they are used (of which I think
> > most are not). So this is an existing bug in the kernel.
> See above. I do not believe that it's an existing bug, but the basic point
> that the change to "smp_rmb()" doesn't change our existing rules is true.

OK, so all that's left is to restore the locked smp_rmb case for broken
ppro and resend the patch?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at