Re: [RFC][PATCH 0/5] arch: atomic rework

From: Paul E. McKenney
Date: Sun Feb 09 2014 - 22:21:20 EST

Next message: Laura Abbott: "Re: [PATCHv2 2/2] arm: Get rid of meminfo"
Previous message: Yan, Zheng: "Re: [PATCH v1 07/10] perf/x86/uncore: allow more than one fixed counterper box"
In reply to: Linus Torvalds: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Paul E. McKenney: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
> On Fri, 2014-02-07 at 10:02 -0800, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 04:55:48PM +0000, Will Deacon wrote:
> > > Hi Paul,
> > >
> > > On Fri, Feb 07, 2014 at 04:50:28PM +0000, Paul E. McKenney wrote:
> > > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > >
> > > > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > > > a wooden stake through its hart.
> > > > >
> > > > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > > > is sorted.
> > > >
> > > > There actually is a proposal being put forward, but it might not make ARM
> > > > and Power people happy because it involves adding a compare, a branch,
> > > > and an ISB/isync after every relaxed load... Me, I agree with you,
> > > > much preferring the no-store-speculation approach.
> > >
> > > Can you elaborate a bit on this please? We don't permit speculative stores
> > > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > > emit any additional instructions to prevent that from happening.
> >
> > Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
> > proof that out-of-thin-air values cannot be observed in the face of any
> > compiler optimization that refrains from reordering a prior relaxed load
> > with a subsequent relaxed store.
> >
> > > Stores can, of course, be observed out-of-order but that's a lot more
> > > reasonable :)
> >
> > So let me try an example. I am sure that Torvald Riegel will jump in
> > with any needed corrections or amplifications:
> >
> > Initial state: x == y == 0
> >
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> >
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> >
> > One would intuitively expect r1 == r2 == 0 as the only possible outcome.
> > But suppose that the compiler used specialization optimizations, as it
> > would if there was a function that has a very lightweight implementation
> > for some values and a very heavyweight one for other. In particular,
> > suppose that the lightweight implementation was for the value 42.
> > Then the compiler might do something like the following:
> >
> > Initial state: x == y == 0
> >
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > if (r1 == 42)
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > else
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> >
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> >
> > Suddenly we have an explicit constant 42 showing up. Of course, if
> > the compiler carefully avoided speculative stores (as both Peter and
> > I believe that it should if its code generation is to be regarded as
> > anything other than an act of vandalism, the words in the standard
> > notwithstanding), there would be no problem. But currently, a number
> > of compiler writers see absolutely nothing wrong with transforming
> > the optimized-for-42 version above with something like this:
> >
> > Initial state: x == y == 0
> >
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > if (r1 != 42)
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> >
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
>
> Intuitively, this is wrong because this let's the program take a step
> the abstract machine wouldn't do. This is different to the sequential
> code that Peter posted because it uses atomics, and thus one can't
> easily assume that the difference is not observable.
>
> For this to be correct, the compiler would actually have to prove that
> the speculative store is "as-if correct", which in turn would mean that
> it needs to be aware of all potential observers, and check whether those
> observers aren't actually affected by the speculative store.
>
> I would guess that the compilers you have in mind don't really do that.
> If they do, then I don't see why this should be okay, unless you think
> out-of-thin-air values are something good (which I wouldn't agree with).

OK, we agree that pulling the atomic store to y out of its "if" statement
is a bad thing. Very good! Now we just have to convince others on
the committee. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Laura Abbott: "Re: [PATCHv2 2/2] arm: Get rid of meminfo"
Previous message: Yan, Zheng: "Re: [PATCH v1 07/10] perf/x86/uncore: allow more than one fixed counterper box"
In reply to: Linus Torvalds: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Paul E. McKenney: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]