Re: Current LKMM patch disposition

From: Joel Fernandes
Date: Sun Feb 12 2023 - 19:54:34 EST


On Sat, Feb 11, 2023 at 9:59 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
[...]
> > is kind of why I want to understand the CAT, because for me
> > explanation.txt is too much at a higher level to get a proper
> > understanding of the memory model.. I tried re-reading explanation.txt
> > many times.. then I realized I am just rewriting my own condensed set
> > of notes every few months.
>
> Would you like to post a few examples showing some of the most difficult
> points you encountered? Maybe explanation.txt can be improved.

Just to list 2 of the pain points:

1. I think it is hard to reason this section
"PROPAGATION ORDER RELATION: cumul-fence"

All store-related fences should affect propagation order, even the
smp_wmb() which is not A-cumulative should do so (po-earlier stores
appearing before po-later). I think expanding this section with some
examples would make sense to understand what makes "cumul-fence"
different from any other store-related fence.

2. This part is confusing and has always confused me " The
happens-before relation (hb) links memory accesses that have to
execute in a certain order"

It is not memory accesses that execute, it is instructions that
execute. Can we separate out "memory access" from "instruction
execution" in this description?

I think ->hb tries to say that A ->hb B means, memory access A
happened before memory access B exactly in its associated
instruction's execution order (time order), but to be specific --
should that be instruction issue order, or instruction retiring order?

AFAICS ->hb maps instruction execution order to memory access order.
Not all ->po does fall into that category because of out-of-order
hardware execution. As does not ->co because the memory subsystem may
have writes to the same variable to be resolved out of order. It would
be nice to call out that ->po is instruction issue order, which is
different from execution/retiring and that's why it cannot be ->hb.

->rf does because of data flow causality, ->ppo does because of
program structure, so that makes sense to be ->hb.

IMHO, ->rfi should as well, because it is embodying a flow of data, so
that is a bit confusing. It would be great to clarify more perhaps
with an example about why ->rfi cannot be ->hb, in the
"happens-before" section.

That's really how far I typically get (line 1368) before life takes
over, and I have to go do other survival-related things. Then I
restart the activity. Now that I started reading the CAT file as well,
I feel I can make it past that line :D. But I never wanted to get past
it, till I built a solid understanding of the contents before it.

As I read the file more, I can give more feedback, but the above are
different 2 that persist.

Thanks!

- Joel