Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Mon Jan 23 2023 - 14:34:29 EST




On 1/23/2023 6:28 PM, Alan Stern wrote:
On Mon, Jan 23, 2023 at 02:59:37PM +0100, Jonas Oberhauser wrote:

On 1/21/2023 9:56 PM, Alan Stern wrote:
There is yet another level of fences in the hierarchy: those which order
instruction execution but not propagation (smp_rmb() and acquire). One
of the important points about cumul-fence is that it excludes this
level.

That's for a functional reason -- prop simply doesn't work for those
fences, so it has to exclude them. But it does work for strong fences,
so excluding them would be an artificial restriction.
Hm, so could we say some fences order
1) propagation with propagation (weak fences)
2) execution with execution (rmb, acquire)
3) propagation with execution (strong fences)

where ordering with execution implicitly orders with propagation as well
because things can only propagate after they execute.
However, the 4th possibility (execution with only propagation) happens not
to exist. I'm not sure if it would even be distinguishable from the second
type.
Only in that such a memory barrier would order po-earlier anything
against po-later stores, whereas rmb orders loads against loads and
acquire orders loads against anything.

In the operational model, can you forward from stores that have not
executed yet?
Yes, it is explicitly allowed. But forwarding doesn't apply in this
situation because stores can be forwarded only to po-later loads, not to
po-earlier ones.

The reason I was asking is because if forwarding was forbidden from non-executed stores, execute-to-prop frences could potentially have observably different behavior from comparable execute-to-execute cases. It's moot because it's not forbidden, but if you want to see the reasoning, consider a case like this:

  load from y ; execute-to-prop-fence ; store to x ; ... ; load from x
  load from y ; execute-to-execute-fence ; store to x ; ... ; load from x

(where both fences only order load->store).
In the first case, x could execute before the load from y and the load from x could already execute.
In the second case, x couldn't execute before the load from y and so (assuming you couldn't forward from non-executed stores) x couldn't execute.
As a result, the second type of fence would have ordered the loads but the first one wouldn't.

Not quite right. A hypothetical non-A-cumulative case for pb would have
to omit the cumul-fence term entirely.
Wouldn't that violate the transitivity of "X is required to propagate before
Y" ?
If I have
   X ->cumul-fence+ Y ->weird-strong-fence Z
doesn't that mean that for every CPU C,
1. X is required to propagate to C before Y propagates to C
2. Y is required to propagate to C before any instruction po-after Z
executes
Not if Y is a load.

I guess one would have to put

(cumul-fence+ ; [W])?

or something like it in the definition.

I suppose it's true that Y being a load would be an exception, but that would only be if the cumul-fence+ sequence either ends in a strong-fence, or in po-unlock-lock-po.
We can ignore the first case (and the ordering would be provided anyways through pb at that point).
For the po-unlock-lock-po, you can just take Y:=the LKW event of the unlock and repeat the argument.

So I don't think the [W] is necessary. (and if it was maybe it would also be necessary in the definition of prop/cumul-fence itself, to account for all the non-A-cumulative fences in there).



Thinking about prop and pb along these lines gives me a weird feeling.
Trying to pinpoint it down, it seems a little bit weird that A-cumul doesn't
appear around the strong-fence in pb.
I think the reason it got left out was because all strong fences are
A-cumulative. If some of them weren't, it would have to appear there in
some form.

Of course it should not appear after
prop which already has an rfe? at the end. Nevertheless, having the rfe? at
the end is clearly important to representing the idea behind prop. If it
weren't for the fact that A-cumul is needed to define prop, it almost makes
me think that it would be nice to express the difference between
A-cumulative and non-A-cumulative fences (that order propagation) by saying
that an A-cumulative fence has
  prop ; a-cumul-fence;rfe? <= prop
while the non-A-cumulative fence has
  prop-without-rfe ; non-a-cumul-fence <= prop-without-rfe
Isn't this just a more complicated way of saying what the A-cumul()
macro expresses?

In the sense that I'm just stating some consequences of A-cumul works inside the model, yes.
But at a syntactic level, no.  The A-cumul puts the rfe? to the front. Here I put the rfe? behind the A-cumulative fence.
And I distinguish between a prop that may have rfe? at the end, and one that doesn't, while the use of A-cumul only applies the "prop-without-rfe" in the sense of
prop-without-rfe ; (A-cumul(...) | ...) <= prop-without-rfe

I think part of my weird feeling comes from this asymmetry between A-cumul() putting the rfe? to the left and prop putting the rfe? to the right. Or more precisely, that the latter is sometimes in anticipation of an A-cumulative fence (where the A-cumul would normally take it to the left of that fence) and sometimes just to express the idea of propagation, and that these are the same, which should somehow lead to a simpler definition but doesn't.

I'm not against this partially overlapping kind of redundancy, but I dislike
subsuming kind of redundancy where some branches of the logic just never
need to be used.
Consider: Could we remove all propagation-ordering fences from ppo
because they are subsumed by prop? (Or is that just wrong?)

Surely not, since prop doesn't usually provide ordering by itself.


In fact, I wouldn't mind removing the happens-before, propagation, and
rcu axioms from LKMM entirely, replacing them with the single
executes-before axiom.
I was planning to propose the same thing, however, I would also propose to
redefine hb and rb by dropping the hb/pb parts at the end of these
relations.

 hb = ....
 pb = prop ; strong-fence ; [Marked]
 rb = prop ; rcu-fence ; [Marked]

 xb = hb|pb|rb
 acyclic xb
I'm not so sure that's a good idea. For instance, it would require the
definitions of rcu-link and rb to be changed from having (hb* ; pb*) to
having (hb | pb)*.
I think that's an improvement. It's obvious that (hb | pb)* is right and so is (pb | hb)*.
For (hb* ; pb*), the first reaction is "why do all the hb edges need to be before the pb edges?", until one realizes that pb actually allows hb* at the end, so in a sense this is  hb* ; (pb ; hb*)*, and then one has to understand that this means that the prop;strong-fence edges can appear any number of times at arbitrary locations. It just seems like defining (pb | hb)* with extra steps.

The order of nesting seems to be also somewhat a matter of preference, perhaps in some weird alternative universe the LKMM says pb = (prop\id)&int | prop;strong-fence  and hb = (rfe | ppo);pb*. (Personally I think the current way is more reasonable than this one, but that might be because our preferences happen to align in this instance.)

Also, although it's not mentioned anywhere, the
definition of xbstar could be changed to hb* ; pb* ; rb* because each of
these relations absorbs a weaker one to its right.

I wouldn't want to need to do this reasoning just to understand that it has arbitrarily many hb, pb, and rb edges.

I'm wondering a little if there's some way in the middle, e.g., by writting
short comments in the model wherever something is redundant. Something like
(* note: strong-fence is included here for completeness, and can be safely
ignored *).
I have no objection to doing that. It seems like a good idea.

Alan
Perhaps we can start a new thread then to discuss a few points where
redundancies might be annotated this way or eliminated.
Sure, go ahead.

I'll put it on my to-do-list, let's converge on some topics first :D

best wishes, jonas