Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po

From: Paul E. McKenney
Date: Sun Feb 26 2023 - 13:45:37 EST


On Sun, Feb 26, 2023 at 11:51:15AM -0500, Alan Stern wrote:
> On Sun, Feb 26, 2023 at 12:17:31PM +0100, Jonas Oberhauser wrote:
> > On 2/26/2023 4:30 AM, Alan Stern wrote:
> > > On Sat, Feb 25, 2023 at 07:09:05PM -0800, Boqun Feng wrote:
> > > > On Sat, Feb 25, 2023 at 09:29:51PM -0500, Alan Stern wrote:
> > > > > On Sat, Feb 25, 2023 at 05:01:10PM -0800, Paul E. McKenney wrote:
> > > > > > A few other oddities:
> > > > > >
> > > > > > litmus/auto/C-LB-Lww+R-OC.litmus
> > > > > >
> > > > > > Both versions flag a data race, which I am not seeing. It appears
> > > > > > to me that P1's store to u0 cannot happen unless P0's store
> > > > > > has completed. So what am I missing here?
> > > > > The LKMM doesn't believe that a control or data dependency orders a
> > > > > plain write after a marked read. Hence in this test it thinks that P1's
> > > > > store to u0 can happen before the load of x1. I don't remember why we
> > > > > did it this way -- probably we just wanted to minimize the restrictions
> > > > > on when plain accesses can execute. (I do remember the reason for
> > > > > making address dependencies induce order; it was so RCU would work.)
> > > > >
> > > > Because plain store can be optimzed as an "store only if not equal"?
> > > > As the following sentenses in the explanations.txt:
> > > >
> > > > The need to distinguish between r- and w-bounding raises yet another
> > > > issue. When the source code contains a plain store, the compiler is
> > > > allowed to put plain loads of the same location into the object code.
> > > > For example, given the source code:
> > > >
> > > > x = 1;
> > > >
> > > > the compiler is theoretically allowed to generate object code that
> > > > looks like:
> > > >
> > > > if (x != 1)
> > > > x = 1;
> > > >
> > > > thereby adding a load (and possibly replacing the store entirely).
> > > > For this reason, whenever the LKMM requires a plain store to be
> > > > w-pre-bounded or w-post-bounded by a marked access, it also requires
> > > > the store to be r-pre-bounded or r-post-bounded, so as to handle cases
> > > > where the compiler adds a load.
> > > Good guess; maybe that was the reason. [...]
> > > So perhaps the original reason is not valid now
> > > that the memory model explicitly includes tests for stores being
> > > r-pre/post-bounded.
> > >
> > > Alan
> >
> > I agree, I think you could relax that condition.
>
> Here's a related question to think about. Suppose a compiler does make
> this change, adding a load-and-test in front of a store. Can that load
> cause a data race?
>
> Normally I'd say no, because compilers aren't allowed to create data
> races where one didn't already exist. But that restriction is part of
> the C/C++ standard, and what we consider to be a data race differs from
> what the standard considers.
>
> So what's the answer? Is the compiler allowed to translate:
>
> r1 = READ_ONCE(*x);
> if (r1)
> *y = 1;
>
> into something resembling:
>
> r1 = READ_ONCE(*x);
> rtemp = *y;
> if (r1) {
> if (rtemp != 1)
> *y = 1;
> }
>
> (Note that whether the load to rtemp occurs inside the "if (r1)"
> conditional or not makes no difference; either way the CPU can execute
> it before testing the condition. Even before reading the value of *x.)
>
> _If_ we assume that these manufactured loads can never cause a data race
> then it should be safe to remove the r-pre/post-bounded tests for plain
> writes.
>
> But what if rtemp reads from a plain write that was torn, and the
> intermediate value it observes happens to be 1, even though neither the
> initial nor the final value of *y was 1?

I am not worried about compilers creating data races, so that assignment
to rtemp would be within the "if (r1)" statement. Not that this matters,
as you say, from a hardware ordering viewpoint.

However, tearing is a concern. Just to make sure I undersand, one
scenario might be where the initial value of *y was zero and the final
value was 0x10001, correct? In that case, we have seen compilers that
would write that constant 16 bits at a time, resulting in an transitory
value of 0x1.

But in this case, we would need the value to -not- be 1 for bad things
to happen, correct?

And in that case, we would need the value to initially be 1 and the
desired value to be 1 and some other store to redundantly set it to
1, but tear is such a way that the transitory value is not 1, correct?
Plus we should detect the data race in that case, not?

Or am I missing yet another opportunity for a mischievous compiler?

> > Note there's also rw-xbstar (used with fr) which doesn't check for
> > r-pre-bounded, but it should be ok. That's because only reads would be
> > unordered, as a result the read (in the if (x != ..) x=..) should provide
> > the correct value. The store would be issued as necessary, and the issued
> > store would still be ordered correctly w.r.t the read.
>
> That isn't the reason I left r-pre-bounded out from rw-xbstar. If the
> write gets changed to a read there's no need for rw-xbstar to check
> r-pre-bounded, because then rw-race would be comparing a read with
> another read (instead of with a write) and so there would be no
> possibility of a race in any case.

True, and if there was a racing write, it would be a data race in
any case.

Thanx, Paul