Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po

From: Jonas Oberhauser
Date: Fri Jan 27 2023 - 10:58:29 EST

Next message: Zenghui Yu: "Re: [PATCH v3 4/4] KVM: arm64: Allow no running vcpu on saving vgic3 pending table"
Previous message: Doug Anderson: "Re: [PATCH] drm/mediatek: Remove dependency on GEM DMA helper"
In reply to: Paul E. McKenney: "Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po"
Next in thread: Paul E. McKenney: "Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1/27/2023 4:13 PM, Paul E. McKenney wrote:

On Fri, Jan 27, 2023 at 02:18:41PM +0100, Jonas Oberhauser wrote:

On 1/27/2023 12:21 AM, Paul E. McKenney wrote:

On Thu, Jan 26, 2023 at 12:08:28PM -0800, Paul E. McKenney wrote:

On Thu, Jan 26, 2023 at 11:36:51AM -0500, Alan Stern wrote:

On Thu, Jan 26, 2023 at 02:46:03PM +0100, Jonas Oberhauser wrote:

LKMM uses two relations for talking about UNLOCK+LOCK pairings:

1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
on the same CPU or immediate lock handovers on the same
lock variable

2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
literally as described in rcupdate.h#L1002, i.e., even
after a sequence of handovers on the same lock variable.

The latter relation is used only once, to provide the guarantee
defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
makes any UNLOCK+LOCK pair followed by the fence behave like a full
barrier.

This patch drops this use in favor of using po-unlock-lock-po
everywhere, which unifies the way the model talks about UNLOCK+LOCK
pairings. At first glance this seems to weaken the guarantee given
by LKMM: When considering a long sequence of lock handovers
such as below, where P0 hands the lock to P1, which hands it to P2,
which finally executes such an after_unlock_lock fence, the mb
relation currently links any stores in the critical section of P0
to instructions P2 executes after its fence, but not so after the
patch.

P0(int *x, int *y, spinlock_t *mylock)
{
spin_lock(mylock);
WRITE_ONCE(*x, 2);
spin_unlock(mylock);
WRITE_ONCE(*y, 1);
}

P1(int *y, int *z, spinlock_t *mylock)
{
int r0 = READ_ONCE(*y); // reads 1
spin_lock(mylock);
spin_unlock(mylock);
WRITE_ONCE(*z,1);
}

P2(int *z, int *d, spinlock_t *mylock)
{
int r1 = READ_ONCE(*z); // reads 1
spin_lock(mylock);
spin_unlock(mylock);
smp_mb__after_unlock_lock();
WRITE_ONCE(*d,1);
}

P3(int *x, int *d)
{
WRITE_ONCE(*d,2);
smp_mb();
WRITE_ONCE(*x,1);
}

exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)

Nevertheless, the ordering guarantee given in rcupdate.h is actually
not weakened. This is because the unlock operations along the
sequence of handovers are A-cumulative fences. They ensure that any
stores that propagate to the CPU performing the first unlock
operation in the sequence must also propagate to every CPU that
performs a subsequent lock operation in the sequence. Therefore any
such stores will also be ordered correctly by the fence even if only
the final handover is considered a full barrier.

Indeed this patch does not affect the behaviors allowed by LKMM at
all. The mb relation is used to define ordering through:
1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
lock-release, rfe, and unlock-acquire orderings each provide hb
2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
lock-release orderings simply add more fine-grained cumul-fence
edges to substitute a single strong-fence edge provided by a long
lock handover sequence
3) mb/strong-fence/pb and various similar uses in the definition of
data races, where as discussed above any long handover sequence
can be turned into a sequence of cumul-fence edges that provide
the same ordering.

Signed-off-by: Jonas Oberhauser <jonas.oberhauser@xxxxxxxxxxxxxxx>
---

Reviewed-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>

A quick spot check showed no change in performance, so thank you both!

Queued for review and further testing.

And testing on https://github.com/paulmckrcu/litmus for litmus tests up
to ten processes and allowing 10 minutes per litmus test got this:

Exact output matches: 5208
!!! Timed out: 38
!!! Unknown primitive: 7

This test compared output with and without your patch.

For the tests with a Results clause, these failed:

Gave me a heart attack there for a second!

Sorry for the scare!!!

Also, I am going to be pushing the scripts I use to mainline. They might
not be perfect, but they will be quite useful for this sort of change
to the memory model.

I could also provide Coq proofs, although those are ignoring the srcu/data
race parts at the moment.

Can such proofs serve as regression tests for future changes?

Thanx, Paul

So-so. On the upside, it would be easy to make them raise an alarm if the future change breaks stuff.
On the downside, they will often need maintenance together with any change. Sometimes a lot, sometimes very little.
I think for the proofs that show the equivalence between two models, the maintenance is quite a bit higher because every change needs to be reflected in both versions. So if you do 10 equivalent transformations and want to show that they remain equivalent with any future changes you do, you need to keep at least 10 additional models around ("current LKMM where ppo isn't in po, current LKMM where the unlock fence still relies on co, ...").

Right now, each equivalence proof I did (e.g., for using po-unlock-lock-po here, or the ppo<=po patch I originally proposed) is at average in the ballpark of 500 lines of proof script. And as evidenced by my discussion with Alan, these proofs only cover the "core model".

So for this kind of thing, I think it's better to look at them to have more confidence in the patch, and do the patch more based on which model is more reasonable (as Alan enforces). Then consider the simplified version as the more natural one, and not worry about future changes that break the equivalence (that would usually indicate a problem with the old model, rather than a problem with the patch).

For regressions, I would rather consider some desirable properties of LKMM, like "DRF-SC" or "monotonicity of barriers" or "ppo <= po" and try to prove those. This has the upside of not requiring to carry additional models around, so much less than half the maintenance effort, and if the property should be broken this usually would indicate a problem with the patch. So I think the bang for the buck is much higher there.

Those are my current thoughts anyways : )

have fun, jonas

Next message: Zenghui Yu: "Re: [PATCH v3 4/4] KVM: arm64: Allow no running vcpu on saving vgic3 pending table"
Previous message: Doug Anderson: "Re: [PATCH] drm/mediatek: Remove dependency on GEM DMA helper"
In reply to: Paul E. McKenney: "Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po"
Next in thread: Paul E. McKenney: "Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]