Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Tue Feb 28 2023 - 03:50:07 EST

Next message: Dmitry Rokosov: "Re: [PATCH v1 0/3] Meson A1 32-bit support"
Previous message: H. Nikolaus Schaller: "[PATCH] PCI: imx6: install the fault handler only if we are really running on a compatible device"
In reply to: Paul E. McKenney: "Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po"
Next in thread: Paul E. McKenney: "Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/27/2023 11:21 PM, Paul E. McKenney wrote:

On Mon, Feb 27, 2023 at 09:13:01PM +0100, Jonas Oberhauser wrote:

On 2/27/2023 8:40 PM, Andrea Parri wrote:

The LKMM doesn't believe that a control or data dependency orders a
plain write after a marked read. Hence in this test it thinks that P1's
store to u0 can happen before the load of x1. I don't remember why we
did it this way -- probably we just wanted to minimize the restrictions
on when plain accesses can execute. (I do remember the reason for
making address dependencies induce order; it was so RCU would work.)

The patch below will change what the LKMM believes. It eliminates the
positive outcome of the litmus test and the data race. Should it be
adopted into the memory model?

(Unpopular opinion I know,) it should drop dependencies ordering, not
add/promote it.

Andrea

Maybe not as unpopular as you think... :)
But either way IMHO it should be consistent; either take all the
dependencies that are true and add them, or drop them all.
In the latter case, RCU should change to an acquire barrier. (also, one
would have to deal with OOTA in some yet different way).

Generally my position is that unless there's a real-world benchmark with
proven performance benefits of relying on dependency ordering, one should
use an acquire barrier. I haven't yet met such a case, but maybe one of you
has...

https://www.msully.net/thesis/thesis.pdf page 128 (PDF page 141).

Though this is admittedly for ARMv7 and PowerPC.

Thanks for the link.

It's true that on architectures that don't have an acquire load (and have to use a fence), the penalty will be bigger.

But the more obvious discussion would be what constitutes a real-world benchmark : )
In my experience you can get a lot of performance benefits out of optimizing barriers in code if all you execute is that code.
But once you embed that into a real-world application, often 90%-99% of time spent will be in the business logic, not in the data structure.

And then the benefits suddenly disappear.
Note that a lot of barriers are a lot cheaper as well when there's no contention.

Because of that, making optimization decisions based on microbenchmarks can sometimes lead to a very poor "time invested" vs "total product improvement" ratio.

Best wishes,
jonas

Next message: Dmitry Rokosov: "Re: [PATCH v1 0/3] Meson A1 32-bit support"
Previous message: H. Nikolaus Schaller: "[PATCH] PCI: imx6: install the fault handler only if we are really running on a compatible device"
In reply to: Paul E. McKenney: "Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po"
Next in thread: Paul E. McKenney: "Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]