Re: [PATCH v11 023/113] KVM: TDX: allocate/free TDX vcpu structure

From: Isaku Yamahata
Date: Tue Feb 28 2023 - 06:06:40 EST


On Thu, Jan 19, 2023 at 12:45:09AM +0000,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@xxxxxxxxx wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> > The next step of TDX guest creation is to create vcpu. Allocate TDX vcpu
> > structures, partially initialize it.  
> >
>
> Why partially initialize it? Shouldn't a better way be either: 1) not
> initialize at all, or; 2) fully initialize? 
>
> Can you put more _why_ here?
>
>
> > Allocate pages of TDX vcpu for the
> > TDX module. Actual donation TDX vcpu pages to the TDX module is not done
> > yet.
>
> Also, can you explain _why_ it is not done here?
>
> >
> > In the case of the conventional case, cpuid is empty at the initialization.
> > and cpuid is configured after the vcpu initialization. Because TDX
> > supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
> > on the vcpu initialization.
>
> Don't quite understand here. As you said CPUID entries are configured later in
> KVM_SET_CPUID2, so what's the point of initializing CPUID to support x2api> Are you suggesting KVM_SET_CPUID2 will be somehow rejected for TDX guest, or
> there will be special handling to make sure the CPUID initialized here won't be
> overwritten later?
>
> Please explain clearly here.

Here is the updated one.

The next step of TDX guest creation is to create vcpu. Allocate TDX vcpu
structures, initialize it that doesn't require TDX SEAMCALL. TDX specific
vcpuid initialization will be implemented as independent KVM_TDX_INIT_VCPU
so that when error occurs it's easy to determine which component has the
issue, KVM or TDX.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization. Because TDX
supports only X2APIC mode, cpuid is forcibly set to support X2APIC and APIC
BASE MSR is forcibly set to X2APIC mode. The MSR will be read only for
guest TD. Because kvm_arch_vcpu_create() also initializes kvm MMU that
depends on local apic settings. So x2apic needs to be initialized to
X2APIC mode by vcpu_reset method before KVM mmu initialization.



> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 557a609c5147..099f0737a5aa 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -281,6 +281,81 @@ int tdx_vm_init(struct kvm *kvm)
> > return 0;
> > }
> >
> > +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_cpuid_entry2 *e;
> > +
> > + /*
> > + * On cpu creation, cpuid entry is blank. Forcibly enable
> > + * X2APIC feature to allow X2APIC.
> > + * Because vcpu_reset() can't return error, allocation is done here.
> > + */
> > + WARN_ON_ONCE(vcpu->arch.cpuid_entries);
> > + WARN_ON_ONCE(vcpu->arch.cpuid_nent);
> > + e = kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
>
> You don't need to use kvmalloc_array() when only allocating one entry.

I'll make it kvmalloc() and add comment on kvmalloc(), not kmalloc().


> > + if (!e)
> > + return -ENOMEM;
> > + *e = (struct kvm_cpuid_entry2) {
> > + .function = 1, /* Features for X2APIC */
> > + .index = 0,
> > + .eax = 0,
> > + .ebx = 0,
> > + .ecx = 1ULL << 21, /* X2APIC */
> > + .edx = 0,
> > + };
> > + vcpu->arch.cpuid_entries = e;
> > + vcpu->arch.cpuid_nent = 1;
>
> As mentioned above, why doing it here? Won't be this be overwritten later in
> KVM_SET_CPUID2?

Yes, user space VMM can overwrite cpuid[0x1] and APIC base MSR. But it doesn't
matter because it's a bug of user space VMM. user space VMM has to keep the
consistency of cpuid and MSRs.
Because TDX module virtualizes cpuid[0x1].x2apic to fixed 1, KVM value doesn't
matter after vcpu creation.
Because KVM virtualizes APIC base as read only to guest, cpuid[0x1].x2apic
doesn't matter after vcpu creation as long as user space VMM keeps KVM APIC BASE
value.

I'll add a comment.


> > +
> > + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > + if (!vcpu->arch.apic)
> > + return -EINVAL;
>
> If this is hit, what happens to the CPUID entry allocated above? It's
> absolutely not clear here in this patch.

It's memory leak. I'll move the check before memory allocation.


> > +
> > + fpstate_set_confidential(&vcpu->arch.guest_fpu);
> > +
> > + vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
> > +
> > + vcpu->arch.cr0_guest_owned_bits = -1ul;
> > + vcpu->arch.cr4_guest_owned_bits = -1ul;
> > +
> > + vcpu->arch.tsc_offset = to_kvm_tdx(vcpu->kvm)->tsc_offset;
> > + vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
> > + vcpu->arch.guest_state_protected =
> > + !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
> > +
> > + return 0;
> > +}
> > +
> > +void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> > +{
> > + /* This is stub for now. More logic will come. */
> > +}
> > +
> > +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > +{
> > + struct msr_data apic_base_msr;
> > +
> > + /* TDX doesn't support INIT event. */
> > + if (WARN_ON_ONCE(init_event))
> > + goto td_bugged;
>
> Should we use KVM_BUG_ON()?
>
> Again, it appears this depends on how KVM handles INIT, which is done in a later
> patch far way:
>
> [PATCH v11 102/113] KVM: TDX: Silently ignore INIT/SIPI
>
> And there's no material explaining how it is handled in either changelog or
> comment, so to me it's not reviewable.

I'll convert them to KVM_BUG_ON(). With this patch, I'll remove WARN_ON_ONCE()
and add KVM_BUG_ON() with the later patch.


> > +
> > + /* TDX rquires X2APIC. */
> > + apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
> > + if (kvm_vcpu_is_reset_bsp(vcpu))
> > + apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
> > + apic_base_msr.host_initiated = true;
> > + if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
> > + goto td_bugged;
>
> I think we have KVM_BUG_ON()?
>
> TDX requires a lot more staff then just x2apic, why only x2apic is done here,
> particularly in _this_ patch?

After this callback, kvm mmu initialization follows. It depends on apic
setting. X2APIC is only devication from the common logic kvm_lapic_reset().
I'll add a comment.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>