Re: [PATCH v3 04/14] KVM: s390: device attribute to set AP interpretive execution

From: Halil Pasic
Date: Thu Mar 15 2018 - 12:25:27 EST




On 03/15/2018 04:23 PM, Tony Krowiak wrote:
> On 03/14/2018 05:57 PM, Halil Pasic wrote:
>>
>> On 03/14/2018 07:25 PM, Tony Krowiak wrote:
>>> The VFIO AP device model exploits interpretive execution of AP
>>> instructions (APIE) to provide guests passthrough access to AP
>>> devices. This patch introduces a new device attribute in the
>>> KVM_S390_VM_CRYPTO device attribute group to set APIE from
>>> the VFIO AP device defined on the guest.
>>>
>>> Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxxxxxxx>
>>> ---
>> [..]
>>
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index a60c45b..bc46b67 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -815,6 +815,19 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂ sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
>>> ÂÂÂÂÂÂÂÂÂ VM_EVENT(kvm, 3, "%s", "DISABLE: DEA keywrapping support");
>>> ÂÂÂÂÂÂÂÂÂ break;
>>> +ÂÂÂ case KVM_S390_VM_CRYPTO_INTERPRET_AP:
>>> +ÂÂÂÂÂÂÂ if (attr->addr) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ if (!test_kvm_cpu_feat(kvm, KVM_S390_VM_CPU_FEAT_AP))
>> Unlock mutex before returning?
> The mutex is unlocked prior to return at the end of the function.

Pierre already pointed out what I mean.

>>
>> Maybe flip conditions (don't allow manipulating apie if feature not there).
>> Clearing the anyways clear apie if feature not there ain't too bad, but
>> rejecting the operation appears nicer to me.
> I think what you're saying is something like this:
>
> ÂÂÂ if (!test_kvm_cpu_feat(kvm, KVM_S390_VM_CPU_FEAT_AP))
> ÂÂÂÂÂÂÂ return -EOPNOTSUPP;
>
> ÂÂÂ kvm->arch.crypto.apie = (attr->addr) ? 1 : 0;
>
> I can make arguments for doing this either way, but since the attribute
> is will most likely only be set by an AP device in userspace, I suppose
> it makes sense to allow setting of the attribute if the AP feature is
> installed. It certainly makes sense for the dedicated implementation.

No strong opinion here.

>>
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -EOPNOTSUPP;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ kvm->arch.crypto.apie = 1;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ VM_EVENT(kvm, 3, "%s",
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ "ENABLE: AP interpretive execution");
>>> +ÂÂÂÂÂÂÂ } else {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ kvm->arch.crypto.apie = 0;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ VM_EVENT(kvm, 3, "%s",
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ "DISABLE: AP interpretive execution");
>>> +ÂÂÂÂÂÂÂ }
>>> +ÂÂÂÂÂÂÂ break;
>>> ÂÂÂÂÂ default:
>>> ÂÂÂÂÂÂÂÂÂ mutex_unlock(&kvm->lock);
>>> ÂÂÂÂÂÂÂÂÂ return -ENXIO;
>> I wonder how the loop after this switch works for KVM_S390_VM_CRYPTO_INTERPRET_AP:
>>
>> ÂÂÂÂÂÂÂÂ kvm_for_each_vcpu(i, vcpu, kvm) {
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm_s390_vcpu_crypto_setup(vcpu);
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ exit_sie(vcpu);
>> ÂÂÂÂÂÂÂÂ }
>>
>> ÂFrom not doing something like for KVM_S390_VM_CRYPTO_INTERPRET_AP
>>
>> ÂÂÂÂÂÂÂÂ if (kvm->created_vcpus) {
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mutex_unlock(&kvm->lock);
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -EBUSY;
>> and from the aforementioned loop I guess ECA.28 can be changed
>> for a running guest.
>>
>> If there are running vcpus when KVM_S390_VM_CRYPTO_INTERPRET_AP is
>> changed (set) these will be taken out of SIE by exit_sie(). Then for the
>> corresponding threads the control probably goes to QEMU (the emulator in
>> the userspace). And it puts that vcpu back into the SIE, and then that
>> cpu starts acting according to the new ECA.28 value. While other vcpus
>> may still work with the old value of ECA.28.
> Assuming the scenario plays out as you described, why would the other vcpus
> be using the old ECA.28 value if the kvm_s390_vcpu_crypto_setup() function
> is executed for each of them to set the new value for ECA.28?

I'm puzzled I though I just described that. The threads implementing the
vcpus are, or at least may be concurrent to the thread doing the loop and
kvm_s390_vcpu_crypto_setup() for each vcpu.

Changing the ECA.28 for each vcpu in the configuration ain't likely to be
simultaneous (we do the kvm_s390_vcpu_crypto_setup() in the loop), but even
if it were simultaneous what would guarantee that the changes is observed
as one atomic change (that is: no mix is observed by the guest)?

(And please read the documentation.)

>>
>> I'm not saying what I describe above is necessarily something broken.
>> But I would like to have it explained, why is it OK -- provided I did not
>> make any errors in my reasoning (assumptions included).
>>
>> Can you help me understand this code?
> Unless I am missing something in the scenario you described, it seems that
> the reason the exit_sie(vcpu) function is called is to ensure that the vcpus
> that are already running acquire the new attribute values changed by this
> function when they are restored to SIE. Of course, my assumption is that
> the kvm_arch_vcpu_setup() function - which calls the kvm_s390_vcpu_crypto_setup()
> function - is invoked when the vcpu is restored to SIE.

I don't know what are you talking about kvm_s390_vcpu_crypto_setup(vcpu) is
invoked in the loop. That changes the State Description.

How is it guaranteed that no vCPU is going to work according to the
new ECA.28 value before *all* vCPUs are made out of SIE by exit_sie()?

Your answers sadly didn't contribute much to my understanding. hope
mine will be more successful in contributing to yours.

Regards,
Halil