Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

From: Paul E. McKenney
Date: Mon Aug 31 2020 - 17:47:44 EST


On Mon, Aug 31, 2020 at 04:17:01PM -0400, Alan Stern wrote:
> On Mon, Aug 31, 2020 at 11:20:37AM -0700, paulmck@xxxxxxxxxx wrote:
> > +No Roach-Motel Locking!
> > +-----------------------
> > +
> > +This example requires familiarity with the herd7 "filter" clause, so
> > +please read up on that topic in litmus-tests.txt.
> > +
> > +It is tempting to allow memory-reference instructions to be pulled
> > +into a critical section, but this cannot be allowed in the general case.
> > +For example, consider a spin loop preceding a lock-based critical section.
> > +Now, herd7 does not model spin loops, but we can emulate one with two
> > +loads, with a "filter" clause to constrain the first to return the
> > +initial value and the second to return the updated value, as shown below:
> > +
> > + /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
> > + P0(int *x, int *y, int *lck)
> > + {
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + WRITE_ONCE(*x, 1);
> > + spin_unlock(lck);
> > + }
> > +
> > + P1(int *x, int *y, int *lck)
> > + {
> > + int r0;
> > + int r1;
> > + int r2;
> > +
> > + r0 = READ_ONCE(*x);
> > + r1 = READ_ONCE(*x);
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + spin_unlock(lck);
> > + }
> > +
> > + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> > + exists (1:r2=1)
> > +
> > +The variable "x" is the control variable for the emulated spin loop.
> > +P0() sets it to "1" while holding the lock, and P1() emulates the
> > +spin loop by reading it twice, first into "1:r0" (which should get the
> > +initial value "0") and then into "1:r1" (which should get the updated
> > +value "1").
> > +
> > +The purpose of the variable "y" is to reject deadlocked executions.
> > +Only those executions where the final value of "y" have avoided deadlock.
> > +
> > +The "filter" clause takes all this into account, constraining "y" to
> > +equal "2", "1:r0" to equal "0", and "1:r1" to equal 1.
> > +
> > +Then the "exists" clause checks to see if P1() acquired its lock first,
> > +which should not happen given the filter clause because P0() updates
> > +"x" while holding the lock. And herd7 confirms this.
> > +
> > +But suppose that the compiler was permitted to reorder the spin loop
> > +into P1()'s critical section, like this:
> > +
> > + /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
> > + P0(int *x, int *y, int *lck)
> > + {
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + WRITE_ONCE(*x, 1);
> > + spin_unlock(lck);
> > + }
> > +
> > + P1(int *x, int *y, int *lck)
> > + {
> > + int r0;
> > + int r1;
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r0 = READ_ONCE(*x);
> > + r1 = READ_ONCE(*x);
> > + r2 = atomic_inc_return(y);
> > + spin_unlock(lck);
> > + }
> > +
> > + locations [x;lck;0:r2;1:r0;1:r1;1:r2]
> > + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> > + exists (1:r2=1)
> > +
> > +If "1:r0" is equal to "0", "1:r1" can never equal "1" because P0()
> > +cannot update "x" while P1() holds the lock. And herd7 confirms this,
> > +showing zero executions matching the "filter" criteria.
> > +
> > +And this is why Linux-kernel lock and unlock primitives must prevent
> > +code from entering critical sections. It is not sufficient to only
> > +prevnt code from leaving them.
>
> Is this discussion perhaps overkill?
>
> Let's put it this way: Suppose we have the following code:
>
> P0(int *x, int *lck)
> {
> spin_lock(lck);
> WRITE_ONCE(*x, 1);
> do_something();
> spin_unlock(lck);
> }
>
> P1(int *x, int *lck)
> {
> while (READ_ONCE(*x) == 0)
> ;
> spin_lock(lck);
> do_something_else();
> spin_unlock(lck);
> }
>
> It's obvious that this test won't deadlock. But if P1 is changed to:
>
> P1(int *x, int *lck)
> {
> spin_lock(lck);
> while (READ_ONCE(*x) == 0)
> ;
> do_something_else();
> spin_unlock(lck);
> }
>
> then it's equally obvious that the test can deadlock. No need for
> fancy memory models or litmus tests or anything else.

For people like you and me, who have been thinking about memory ordering
for longer than either of us care to admit, this level of exposition is
most definitely -way- overkill!!!

But I have had people be very happy and grateful that I explained this to
them at this level of detail. Yes, I started parallel programming before
some of them were born, but they are definitely within our target audience
for this particular document. And it is not just Linux kernel hackers
who need this level of detail. A roughly similar transactional-memory
scenario proved to be so non-obvious to any number of noted researchers
that Blundell, Lewis, and Martin needed to feature it in this paper:
https://ieeexplore.ieee.org/abstract/document/4069174
(Alternative source: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1344&context=cis_papers)

Please note that I am -not- advocating making (say) explanation.txt or
recipes.txt more newbie-accessible than they already are. After all,
the point of the README file in that same directory is to direct people
to the documentation files that are the best fit for them, and both
explanation.txt and recipes.txt contain advanced material, and thus
require similarly advanced prerequisites.

Seem reasonable, or am I missing your point?

Thanx, Paul