Re: [PATCH v7 8/8] KVM: VMX: enable IPI virtualization

From: Sean Christopherson
Date: Mon Apr 04 2022 - 17:18:35 EST


On Sun, Apr 03, 2022, Zeng Guang wrote:
>
> On 4/1/2022 10:37 AM, Sean Christopherson wrote:
> > > @@ -4219,14 +4226,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> > > pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
> > > if (cpu_has_secondary_exec_ctrls()) {
> > > - if (kvm_vcpu_apicv_active(vcpu))
> > > + if (kvm_vcpu_apicv_active(vcpu)) {
> > > secondary_exec_controls_setbit(vmx,
> > > SECONDARY_EXEC_APIC_REGISTER_VIRT |
> > > SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> > > - else
> > > + if (enable_ipiv)
> > > + tertiary_exec_controls_setbit(vmx,
> > > + TERTIARY_EXEC_IPI_VIRT);
> > > + } else {
> > > secondary_exec_controls_clearbit(vmx,
> > > SECONDARY_EXEC_APIC_REGISTER_VIRT |
> > > SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> > > + if (enable_ipiv)
> > > + tertiary_exec_controls_clearbit(vmx,
> > > + TERTIARY_EXEC_IPI_VIRT);
> > Oof. The existing code is kludgy. We should never reach this point without
> > enable_apicv=true, and enable_apicv should be forced off if APICv isn't supported,
> > let alone seconary exec being support.
> >
> > Unless I'm missing something, throw a prep patch earlier in the series to drop
> > the cpu_has_secondary_exec_ctrls() check, that will clean this code up a smidge.
>
> cpu_has_secondary_exec_ctrls() check can avoid wrong vmcs write in case mistaken
> invocation.

KVM has far bigger problems on buggy invocation, and in that case the resulting
printk + WARN from the failed VMWRITE is a good thing.

> > > +
> > > + if (!pages)
> > > + return -ENOMEM;
> > > +
> > > + kvm_vmx->pid_table = (void *)page_address(pages);
> > > + kvm_vmx->pid_last_index = kvm_vmx->kvm.arch.max_vcpu_id - 1;
> > No need to cache pid_last_index, it's only used in one place (initializing the
> > VMCS field). The allocation/free paths can use max_vcpu_id directly. Actually,
>
> In previous design, we don't forbid to change max_vcpu_id after vCPU creation
> or for other purpose in future. Thus it's safe to decouple them and make ipiv
> usage independent. If it can be sure that max_vcpu_id won't be modified , we
> can totally remove pid_last_index and use max_vcpu_id directly even for
> initializing the VMCD field.

max_vcpu_id asolutely needs to be constant after the first vCPU is created.

> > > @@ -7123,6 +7176,22 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> > > goto free_vmcs;
> > > }
> > > + /*
> > > + * Allocate PID-table and program this vCPU's PID-table
> > > + * entry if IPI virtualization can be enabled.
> > Please wrap comments at 80 chars. But I'd just drop this one entirely, the code
> > is self-explanatory once the allocation and setting of the vCPU's entry are split.
> >
> > > + */
> > > + if (vmx_can_use_ipiv(vcpu->kvm)) {
> > > + struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
> > > +
> > > + mutex_lock(&vcpu->kvm->lock);
> > > + err = vmx_alloc_pid_table(kvm_vmx);
> > > + mutex_unlock(&vcpu->kvm->lock);
> > This belongs in vmx_vm_init(), doing it in vCPU creation is a remnant of the
> > dynamic resize approach that's no longer needed.
>
> We cannot allocate pid table in vmx_vm_init() as userspace has no chance to
> set max_vcpu_ids at this stage. That's the reason we do it in vCPU creation
> instead.

Ah, right. Hrm. And that's going to be a recurring problem if we try to use the
dynamic kvm->max_vcpu_ids to reduce other kernel allocations.

Argh, and even kvm_arch_vcpu_precreate() isn't protected by kvm->lock.

Taking kvm->lock isn't problematic per se, I just hate doing it so deep in a
per-vCPU flow like this.

A really gross hack/idea would be to make this 64-bit only and steal the upper
32 bits of @type in kvm_create_vm() for the max ID.

I think my first choice would be to move kvm_arch_vcpu_precreate() under kvm->lock.
None of the architectures that have a non-nop implemenation (s390, arm64 and x86)
do significant work, so holding kvm->lock shouldn't harm performance. s390 has to
acquire kvm->lock in its implementation, so we could drop that. And looking at
arm64, I believe its logic should also be done under kvm->lock.

It'll mean adding yet another kvm_x86_ops, but I like that more than burying the
code deep in vCPU creation.

Paolo, any thoughts on this?