Re: [PATCH v18 17/18] s390/Docs: new doc describing lock usage by the vfio_ap device driver

From: Halil Pasic
Date: Wed Apr 06 2022 - 08:30:03 EST


On Mon, 4 Apr 2022 17:34:48 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

> On 3/30/22 20:28, Halil Pasic wrote:
> > On Mon, 14 Feb 2022 19:50:39 -0500
> > Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
> >
> >> Introduces a new document describing the locks used by the vfio_ap device
> >> driver and how to use them so as to avoid lockdep reports and deadlock
> >> situations.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxx>
> >> ---
> >> Documentation/s390/vfio-ap-locking.rst | 389 +++++++++++++++++++++++++
> >> 1 file changed, 389 insertions(+)
> >> create mode 100644 Documentation/s390/vfio-ap-locking.rst
> >>
> >> diff --git a/Documentation/s390/vfio-ap-locking.rst b/Documentation/s390/vfio-ap-locking.rst
> >> new file mode 100644
> >> index 000000000000..10abbb6d6089
> >> --- /dev/null
> >> +++ b/Documentation/s390/vfio-ap-locking.rst
> >> @@ -0,0 +1,389 @@
> >> +======================
> >> +VFIO AP Locks Overview
> >> +======================
> >> +This document describes the locks that are pertinent to the secure operation
> >> +of the vfio_ap device driver. Throughout this document, the following variables
> >> +will be used to denote instances of the structures herein described:
> >> +
> >> +struct ap_matrix_dev *matrix_dev;
> >> +struct ap_matrix_mdev *matrix_mdev;
> >> +struct kvm *kvm;
> >> +
> >> +The Matrix Devices Lock (drivers/s390/crypto/vfio_ap_private.h)
> >> +--------------------------------------------------------------
> >> +
> >> +struct ap_matrix_dev {
> >> + ...
> >> + struct list_head mdev_list;
> >> + struct mutex mdevs_lock;
> >> + ...
> >> +}
> >> +
> >> +The Matrix Devices Lock (matrix_dev->mdevs_lock) is implemented as a global
> >> +mutex contained within the single instance of struct ap_matrix_dev. This lock
> > s/single instance of/singleton object of type/
>
> I don't see the problem with instance, but I'll go ahead and make the
> change.

My problem is not with the word "instance", and problem is too strong
word anyway. I think that "the single instance of" is a little vague,
because there is ambiguity about the singularity (and existence) of
the single instance. If you have multiple servers, you may have several
"the single instances" in that system (one per kernel, or none if no
vfio-ap module is loaded for example). If you have a nested
visualization setup, one can even argue that there are multiple instances
of "the single instance" within one Linux system.

I see an advantage in using the "singleton object", because most of us
have at least heard of the singleton design pattern, if not learned
about it in class, and thus the scope of singularity is much better
defined.


>
> >
> >> +controls access to all fields contained within each matrix_mdev instance under
> >> +the control of the vfio_ap device driver (matrix_dev->mdev_list).
> > Are there matrix_mdev instances not under the control of the vfio_ap
> > device driver?
>
> No, but it doesn't make the statement any less true.

No it does not make it less true, just more confusing. By that logic you
could logical and every of your statements with all the tautologies of
this world.

>I'll take it out,
> however.
>
> >
> > (MARK 1)
> >
> >> This lock must
> >> +be held while reading from, writing to or using the data from a field contained
> >> +within a matrix_mdev instance representing one of the vfio_ap device driver's
> >> +mediated devices.
> > This makes it look like for example struct vfio_ap_queue objects are out
> > of scope.
>
> How so? The vfio_ap_queue objects are linked to the ap_matrix_mdev object

The *key* are the words "linked" versus "a field contained within a
matrix_mdev".

> to which the APQN is assigned. Other than that, they are contained in
> the driver
> data of the queue device.

The vfio_ap_queue object a separate object allocated in
vfio_ap_mdev_probe_queue() and is certainly not a field contained within
a matrix_mdev.

Please notice that if you were to extend to all the objects reachable
from matrix_mdev instances, you would be in trouble, because a pointer
to kvm is also reachable, and via that pointer an awful lot of things
that are certainly out of scope.


>
> >
> >> +
> >> +The KVM Lock (include/linux/kvm_host.h)
> >> +---------------------------------------
> >> +
> >> +struct kvm {
> >> + ...
> >> + struct mutex lock;
> >> + ...
> >> +}
> >> +
> >> +The KVM Lock (kvm->lock) controls access to the state data for a KVM guest. This
> >> +lock must be held by the vfio_ap device driver while one or more AP adapters,
> >> +domains or control domains are being plugged into or unplugged from the guest.
> >> +
> >> +The vfio_ap device driver registers a function to be notified when the pointer
> >> +to the kvm instance has been set. The KVM pointer is passed to the handler by
> >> +the notifier and is stored in the in the matrix_mdev instance
> >> +(matrix_mdev->kvm = kvm) containing the state of the mediated device that has
> >> +been passed through to the KVM guest.
> >> +
> >> +The Guests Lock (drivers/s390/crypto/vfio_ap_private.h)
> >> +-----------------------------------------------------------
> >> +
> >> +struct ap_matrix_dev {
> >> + ...
> >> + struct list_head mdev_list;
> >> + struct mutex guests_lock;
> >> + ...
> >> +}
> >> +
> >> +The Guests Lock (matrix_dev->guests_lock) controls access to the
> >> +matrix_mdev instances (matrix_dev->mdev_list) that represent mediated devices
> >> +that hold the state for the mediated devices that have been passed through to a
> > Didn't say that access to fields of matrix_mdev instances is controlled
> > by the matrix_dev->mdevs lock at (MARK 1)? How do the two statements
> > mesh?
>
> The matrix_dev->mdevs_lock controls access to all FIELDS contained within each matrix_mdev
> and the matrix_dev->guests_lock controls access to matrix_mdev instances; in other words, the
> matrix_dev->mdev_list and the matrix_mdev instances for the purposes further described
> below.
>

See above.

>
>
> >
> >> +KVM guest. This lock must be held:
> >> +
> >> +1. To control access to the KVM pointer (matrix_mdev->kvm) while the vfio_ap
> >> + device driver is using it to plug/unplug AP devices passed through to the KVM
> >> + guest.
> >> +
> >> +2. To add matrix_mdev instances to or remove them from matrix_dev->mdev_list.
> >> + This is necessary to ensure the proper locking order when the list is perused
> >> + to find an ap_matrix_mdev instance for the purpose of plugging/unplugging
> >> + AP devices passed through to a KVM guest.
> >> +
> >> + For example, when a queue device is removed from the vfio_ap device driver,
> >> + if the adapter is passed through to a KVM guest, it will have to be
> >> + unplugged. In order to figure out whether the adapter is passed through,
> >> + the matrix_mdev object to which the queue is assigned will have to be
> >> + found. The KVM pointer (matrix_mdev->kvm) can then be used to determine if
> >> + the mediated device is passed through (matrix_mdev->kvm != NULL) and if so,
> >> + to unplug the adapter.
> >> +
> >> +It is not necessary to take the Guests Lock to access the KVM pointer if the
> >> +pointer is not used to plug/unplug devices passed through to the KVM guest;
> >> +however, in this case, the Matrix Devices Lock (matrix_dev->mdevs_lock) must be
> >> +held in order to access the KVM pointer since it set and cleared under the
> >> +protection of the Matrix Devices Lock. A case in point is the function that
> >> +handles interception of the PQAP(AQIC) instruction sub-function. This handler
> >> +needs to access the KVM pointer only for the purposes of setting or clearing IRQ
> >> +resources, so only the matrix_dev->mdevs_lock needs to be held.
> >> +
> > It is very unclear what this lock is actually protecting, and when does
> > it need to be taken.
>
> I don't know how I can make it clearer. In 1 above, it states the it
> protects
> access to the KVM pointer when it is being used to plug/unplug AP devices.
> In other words, if the matrix_mdev->kvm pointer is being accessed just
> to verify whether the mdev is attached to a guest or not, it is not
> necessary to
> take the matrix_dev->guests_lock. On the other hand, whenever the
> matrix_mdev->kvm
> pointer is being taken to dynamically update the guest's APCB (i.e., hot
> plug/unplug AP
> devices), the matrix_dev->guests_lock must be held. Maybe if I had said
> hot plug/unplug
> it would be clearer? I'm open to suggestions.
>
> >
> >> +The PQAP Hook Lock (arch/s390/include/asm/kvm_host.h)
> >> +-----------------------------------------------------
> >> +
> >> +typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
> >> +
> >> +struct kvm_s390_crypto {
> >> + ...
> >> + struct rw_semaphore pqap_hook_rwsem;
> >> + crypto_hook *pqap_hook;
> >> + ...
> >> +};
> >> +
> >> +The PQAP Hook Lock is a r/w semaphore that controls access to the function
> >> +pointer of the handler (*kvm->arch.crypto.pqap_hook) to invoke when the
> >> +PQAP(AQIC) instruction sub-function is intercepted by the host. The lock must be
> >> +held in write mode when pqap_hook value is set, and in read mode when the
> >> +pqap_hook function is called.
> >> +
> >> +Locking Order
> >> +-------------
> >> +
> >> +If the various locks are not taken in the proper order, it could potentially
> >> +result in a lockdep splat.
> > Just in a lockdep splat or in a deadlock?
>
> I've never actually encountered a deadlock condition while testing,
> only a lockdep splat indicating a deadlock could occur. I'll go ahead
> and add 'deadlock or lockdep splat'.

Sorry, it is my sensitivity regarding situations were people claim they
are just fixing a compiler warning, where in reality that compiler
warning is just drawing attention to a severe bug. In my eyes lockdep
is just a tool to detect locking problems. Just focusing on the tool
is missing the point.

>
> >
> >> The proper order for taking locks depends upon
> >> +the operation taking place,
> > That sounds very fishy! The whole point of having a locking order is
> > preventing deadlocks if everybody sticks to the locking order. If there
> > are exceptions, i.e. if we violate the locking order, we risk deadlocks.
>
> It may sound fishy, but it's a true statement. For example, there
> are cases where the mdev is not attached to a KVM guest.
> In that case, the matrix_mdev->kvm pointer will be NULL and
> the matrix_mdev->kvm->lock will not be taken. Of course, the
> matrix_dev->guests_lock will still have to be taken before the
> matrix_dev->mdevs_lock.

But the locks are still taken in the very same order. You may never
have a situation where you first blocking-take A with B held, and another
one where you blocking-take B with A held. That is what "locking order"
is about in CS.

> Maybe I should just make that point
> clearer here.
>
> After rereading the passage, I think that sentence should be
> removed. It can be seen in the examples when the KVM lock
> will not be taken.
>

No having it on one place is great. If you don't state the lock
hierarchy in one place, people would need to extract it from the
jungle of scenarios.

[..]

Regards,
Halil