Re: Current LKMM patch disposition

From: Alan Stern
Date: Mon Feb 13 2023 - 11:48:18 EST


On Sun, Feb 12, 2023 at 07:54:15PM -0500, Joel Fernandes wrote:
> On Sat, Feb 11, 2023 at 9:59 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> [...]
> > > is kind of why I want to understand the CAT, because for me
> > > explanation.txt is too much at a higher level to get a proper
> > > understanding of the memory model.. I tried re-reading explanation.txt
> > > many times.. then I realized I am just rewriting my own condensed set
> > > of notes every few months.
> >
> > Would you like to post a few examples showing some of the most difficult
> > points you encountered? Maybe explanation.txt can be improved.
>
> Just to list 2 of the pain points:
>
> 1. I think it is hard to reason this section
> "PROPAGATION ORDER RELATION: cumul-fence"
>
> All store-related fences should affect propagation order, even the
> smp_wmb() which is not A-cumulative should do so (po-earlier stores
> appearing before po-later). I think expanding this section with some
> examples would make sense to understand what makes "cumul-fence"
> different from any other store-related fence.

Adding some examples is a good idea. That section is pretty terse.

> 2. This part is confusing and has always confused me " The
> happens-before relation (hb) links memory accesses that have to
> execute in a certain order"
>
> It is not memory accesses that execute, it is instructions that
> execute. Can we separate out "memory access" from "instruction
> execution" in this description?

The memory model doesn't really talk about instruction execution; it
thinks about everything in terms of events rather than instructions.
However, I agree that the document isn't very precise about the
distinction between instructions and events.

> I think ->hb tries to say that A ->hb B means, memory access A
> happened before memory access B exactly in its associated
> instruction's execution order (time order), but to be specific --
> should that be instruction issue order, or instruction retiring order?

The model isn't that specific. It doesn't recognize that instructions
take a nonzero amount of time to execute; it thinks of events as
happening instantaneously. (With exceptions perhaps for
synchronize_rcu() and synchronize_srcu().)

If you want to relate the memory model to actual hardware, it's probably
best to think in terms of instruction retiring. But even that isn't
exactly right.

For example, real CPUs can satisfy loads speculatively, possibly
multiple times, before retiring them -- you should think of a load as
executing at the _last_ time it is satisfied. This generally is after
the instruction has been issued and before it is retired. You can think
of a store as executing at the time the CPU commits to it.

> AFAICS ->hb maps instruction execution order to memory access order.

That's not the right way to think about it. In the model, a memory
access occurs when the corresponding event executes. So saying that two
events (or instructions) execute in a certain order is the same as
saying that their two memory accesses execute in that order. There's no
mapping involved.

> Not all ->po does fall into that category because of out-of-order
> hardware execution. As does not ->co because the memory subsystem may
> have writes to the same variable to be resolved out of order. It would
> be nice to call out that ->po is instruction issue order, which is
> different from execution/retiring and that's why it cannot be ->hb.

Okay, that would be a worthwhile addition.

> ->rf does because of data flow causality, ->ppo does because of
> program structure, so that makes sense to be ->hb.
>
> IMHO, ->rfi should as well, because it is embodying a flow of data, so
> that is a bit confusing. It would be great to clarify more perhaps
> with an example about why ->rfi cannot be ->hb, in the
> "happens-before" section.

Maybe. We do talk about store forwarding, and in fact the ppo section
already says:

------------------------------------------------------------------------
R ->dep W ->rfi R',

where the dep link can be either an address or a data dependency. In
this situation we know it is possible for the CPU to execute R' before
W, because it can forward the value that W will store to R'.
------------------------------------------------------------------------

I suppose this could be reiterated in the hb section.

Alan