Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Tue Jan 24 2023 - 15:23:54 EST




On 1/24/2023 6:14 PM, Alan Stern wrote:
On Tue, Jan 24, 2023 at 02:14:03PM +0100, Jonas Oberhauser wrote:
After mulling it over a bit in my big old head, I consider that even though
dropping the [W] may be shorter, it might make for the simpler model by
excluding lots of cases.
That makes me think you should do it for real in the definition of prop. And
not just at the very end, because in fact each cumul-fence link might come
from a non-A-cumulative fence. So the same argument you are giving should be
applied recursively.
Either

prop = (overwrite & ext)? ; (cumul-fence; [W])* ; rfe?

or integrate it directly into cumul-fence.
I dislike this sort of argument. I understand the formal memory model
by relating it to the informal operational model. Thus, cumul-fence
links a write W to another event E when the fence guarantees that W will
propagate to E's CPU before E executes.

I later wondered why it's not defined like this and realized that prop means that it's before E executes.

That's how the memory model
expresses the propagation properties of these fences.

I don't think that's really a perfect match though.
For example, W ->wmb E (and thus cumul-fence) does guarantee that W propagates to E's CPU before E executes.
But the propagation property of wmb is that W propagates to every CPU before E propagates to that CPU.
It just so happens that the time E propagates to E's CPU is the time it executes.

Indeed, looking at the non-strong properties of fences only, should give rise to a relation that only says "W propagates to any CPU before E propagates to that CPU" and that is a relation between stores. And quite different from "W propagates to E's CPU before E executes".

I believe that relation is (cumul-fence;[W])+.
Then
  X ->(overwrite&ext);(cumul-fence;[W])* E
means that there is some W co-after X which propagates to any CPU no later than E due to the weak properties of fences along that path.
And
  X ->(overwrite&ext);(cumul-fence;[W])*;rfe? E
implies that there is some W co-after X which propagates to the CPU executing E no later than E executes. (because E observes or executes and hence propapagated to itself a store that must propagate to E's CPU no earlier than W).

I think this is closer to the idea of expressing the (non-strong) propagation properties of the fences.

I don't want to
rule out the possibility that E is a read merely because cumul-fence
might be followed by another, A-cumulative fence.
Perhaps you mean non-A-cumulative fence?
The A-cumulative fences (when their A-cumulativity is actually used) already rule out reads because they use
overwrite;cumul-fence*;rfe;(the a-cumulativity)


Consider: Could we remove all propagation-ordering fences from ppo
because they are subsumed by prop? (Or is that just wrong?)
Surely not, since prop doesn't usually provide ordering by itself.
Sorry, I meant the prop-related non-ppo parts of hb and pb.
I still don't follow :( Can you write some equations to show me what you
mean?
Consider:

Rx=1 Ry=1
Wrelease Y=1 Wx=1

Here we have Wx=1 ->hb* Ry=1 by (prop \ id) & int, using the fact that
Wy=1 is an A-cumulative release fence. But we also have

Wx=1 ->rfe Rx=1 ->ppo Wy=1 ->rfe Ry=1.

Thus there are two distinct ways of proving that Wx=1 ->hb* Ry=1. If we
removed the fence term from the definition of ppo (or weakened it to
just rmb | acq), we would eliminate the second, redundant proof. Is
this the sort of thing you think we should do?

The reason I wouldn't do something like that is that firstly, the fence does preserve the program order, and secondly there are proofs where you need to use that fact.

In fact, I wouldn't mind removing the happens-before, propagation, and
rcu axioms from LKMM entirely, replacing them with the single
executes-before axiom.
I was planning to propose the same thing, however, I would also propose to
redefine hb and rb by dropping the hb/pb parts at the end of these
relations.

 hb = ....
 pb = prop ; strong-fence ; [Marked]
 rb = prop ; rcu-fence ; [Marked]

 xb = hb|pb|rb
 acyclic xb
I'm not so sure that's a good idea. For instance, it would require the
definitions of rcu-link and rb to be changed from having (hb* ; pb*) to
having (hb | pb)*.
I think that's an improvement. It's obvious that (hb | pb)* is right and so
is (pb | hb)*.
For (hb* ; pb*), the first reaction is "why do all the hb edges need to be
before the pb edges?", until one realizes that pb actually allows hb* at the
end, so in a sense this is  hb* ; (pb ; hb*)*, and then one has to
understand that this means that the prop;strong-fence edges can appear any
number of times at arbitrary locations. It just seems like defining (pb |
hb)* with extra steps.
This can be mentioned explicitly as a comment or in explanation.txt.
Ok, but why not just use  (pb|hb)* and (pb|hb|rb)* and not worry about
having to explain anything?
And make the hb and rb definitions simpler at the same time?
Do you think (pb | hb)* is simpler than pb* (as in the statement of the
propagation axiom)?
pb+,  however aren't you thinking of getting rid of the propagation axiom?
I still think (pb' | hb)+ where pb' is the simpler definition of pb is simpler than pb*, where pb=pb';hb*.

Besides, remember what I said about understanding the formal memory
model in terms of the operational model. pb relates a write W to
another event E when the strong fence guarantees that W will propagate
to E's CPU before E executes.
I suppose to every CPU before E executes?

If the hb* term were omitted from the definition of pb, this wouldn't be true any more. Or at least, not as
true as it should be.

Why is that the right level of how true it should be?

doesn't W ->pb;rb E also guarantee that W will propagate to E's CPU before E executes?
Or even just W ->pb;pb E?
Why only consider W->pb;hb E?

I'd rather think of it in terms of "this is the basic block that implies that it W executes before E because it propagates to every CPU before E executes, and then you can of course extend it by adding any of pb,rb, and hb at the end to get a longer "executes before"".

Best wishes, jonas