Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

From: Paul E. McKenney
Date: Thu Jan 12 2023 - 10:29:57 EST


On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> >
> > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> > +  be taken inside a kvm->srcu read-side critical section; that is, the
> > +  following is broken::
> > +
> > +      srcu_read_lock(&kvm->srcu);
> > +      mutex_lock(&kvm->slots_lock);
> > +
>
> "Don't tell me. Tell lockdep!"
>
> Did we conclude in
> https://lore.kernel.org/kvm/122f38e724aae9ae8ab474233da1ba19760c20d2.camel@xxxxxxxxxxxxx/
> that lockdep *could* be clever enough to catch a violation of this rule
> by itself?
>
> The general case of the rule would be that 'if mutex A is taken in a
> read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> held shall be verboten'. And vice versa.
>
> If we can make lockdep catch it automatically, yay!

Unfortunately, lockdep needs to see a writer to complain, and that patch
just adds a reader. And adding that writer would make lockdep complain
about things that are perfectly fine. It should be possible to make
lockdep catch this sort of thing, but from what I can see, doing so
requires modifications to lockdep itself.

> If not, I'm inclined to suggest that we have explicit wrappers of our
> own for kvm_mutex_lock() which will do the check directly.

This does allow much more wiggle room. For example, you guys could decide
to let lockdep complain about things that other SRCU users want to do.
For completeness, here is one such scenario:

CPU 0: read_lock(&rla); srcu_read_lock(&srcua); ...

CPU 1: srcu_read_lock(&srcua); read_lock(&rla); ...

CPU 2: synchronize_srcu(&srcua);

CPU 3: write_lock(&rla); ...

If you guys are OK with lockdep complaining about this, then doing a
currently mythical rcu_write_acquire()/rcu_write_release() pair around
your calls to synchronize_srcu() should catch the other issue.

And probably break something else, but you have to start somewhere! ;-)

Thanx, Paul