Re: Current LKMM patch disposition

From: Alan Stern
Date: Mon Feb 13 2023 - 20:57:36 EST


On Mon, Feb 13, 2023 at 07:36:42PM -0500, Joel Fernandes wrote:
> Thanks, I agree with most of your last email, just replying to one thing:
>
> > > ->rf does because of data flow causality, ->ppo does because of
> > > program structure, so that makes sense to be ->hb.
> > >
> > > IMHO, ->rfi should as well, because it is embodying a flow of data, so
> > > that is a bit confusing. It would be great to clarify more perhaps
> > > with an example about why ->rfi cannot be ->hb, in the
> > > "happens-before" section.
> >
> > Maybe. We do talk about store forwarding, and in fact the ppo section
> > already says:
> >
> > ------------------------------------------------------------------------
> > R ->dep W ->rfi R',
> >
> > where the dep link can be either an address or a data dependency. In
> > this situation we know it is possible for the CPU to execute R' before
> > W, because it can forward the value that W will store to R'.
> > ------------------------------------------------------------------------
>
> Thank you for pointing this out! In the text that follows this, in
> this paragraph:
>
> <quote>
> where the dep link can be either an address or a data dependency. In
> this situation we know it is possible for the CPU to execute R' before
> W, because it can forward the value that W will store to R'. But it
> cannot execute R' before R, because it cannot forward the value before
> it knows what that value is, or that W and R' do access the same
> location.
> </quote>
>
> The "in this situation" should be clarified that the "situation" is a
> data-dependency. Only in the case of data-dependency, the ->rfi
> cannot cause misordering if I understand it correctly. However, that
> sentence does not mention data-dependency explicitly. Or let me know
> if I missed something?

The text explicitly says that the dep link can be either an address or a
data dependency. In either case, R' cannot be reordered before R.

In theory this doesn't have to be true for address dependencies, because
the CPU might realize that W and R' access the same address without
knowing what that address is. However, I've been reliably informed that
no existing architectures do this sort of optimization.

The case of a control dependency is different, because the CPU can
speculate that W will be executed and can speculatively forward the
value from W to R' before it knows what value R will read.

Alan