Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po

From: Paul E. McKenney
Date: Tue Feb 28 2023 - 10:40:42 EST


On Tue, Feb 28, 2023 at 09:49:07AM +0100, Jonas Oberhauser wrote:
>
>
> On 2/27/2023 11:21 PM, Paul E. McKenney wrote:
> > On Mon, Feb 27, 2023 at 09:13:01PM +0100, Jonas Oberhauser wrote:
> > >
> > > On 2/27/2023 8:40 PM, Andrea Parri wrote:
> > > > > The LKMM doesn't believe that a control or data dependency orders a
> > > > > plain write after a marked read. Hence in this test it thinks that P1's
> > > > > store to u0 can happen before the load of x1. I don't remember why we
> > > > > did it this way -- probably we just wanted to minimize the restrictions
> > > > > on when plain accesses can execute. (I do remember the reason for
> > > > > making address dependencies induce order; it was so RCU would work.)
> > > > >
> > > > > The patch below will change what the LKMM believes. It eliminates the
> > > > > positive outcome of the litmus test and the data race. Should it be
> > > > > adopted into the memory model?
> > > > (Unpopular opinion I know,) it should drop dependencies ordering, not
> > > > add/promote it.
> > > >
> > > > Andrea
> > > Maybe not as unpopular as you think... :)
> > > But either way IMHO it should be consistent; either take all the
> > > dependencies that are true and add them, or drop them all.
> > > In the latter case, RCU should change to an acquire barrier. (also, one
> > > would have to deal with OOTA in some yet different way).
> > >
> > > Generally my position is that unless there's a real-world benchmark with
> > > proven performance benefits of relying on dependency ordering, one should
> > > use an acquire barrier. I haven't yet met such a case, but maybe one of you
> > > has...
> > https://www.msully.net/thesis/thesis.pdf page 128 (PDF page 141).
> >
> > Though this is admittedly for ARMv7 and PowerPC.
> >
>
> Thanks for the link.
>
> It's true that on architectures that don't have an acquire load (and have to
> use a fence), the penalty will be bigger.
>
> But the more obvious discussion would be what constitutes a real-world
> benchmark : )
> In my experience you can get a lot of performance benefits out of optimizing
> barriers in code if all you execute is that code.
> But once you embed that into a real-world application, often 90%-99% of time
> spent will be in the business logic, not in the data structure.
>
> And then the benefits suddenly disappear.
> Note that a lot of barriers are a lot cheaper as well when there's no
> contention.
>
> Because of that, making optimization decisions based on microbenchmarks can
> sometimes lead to a very poor "time invested" vs "total product improvement"
> ratio.

All true, though that 2x and 4x should be worth something.

The real-world examples I know of involved garbage collectors, and the
improvement was said to be a few percent system-wide. But that was a
verbal exchange, so I don't have a citation for you.

Thanx, Paul