Re: [PATCH v11 024/113] KVM: TDX: Do TDX specific vcpu initialization

From: Isaku Yamahata
Date: Tue Feb 28 2023 - 06:28:44 EST


On Thu, Jan 19, 2023 at 10:37:43AM +0000,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@xxxxxxxxx wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> > TD guest vcpu need to be configured before ready to run which requests
> > addtional information from Device model (e.g. qemu), one 64bit value is
> > passed to vcpu's RCX as an initial value.  
> >
>
> The first half sentence doesn't parse to me. It also has grammar issue.
>
> Also, the second half only talks about TDH.VP.INIT, but there's more regarding
> to creating/initializing a TDX guest vcpu. IMHO It would be better if you can
> briefly describe the whole sequence here so people can get some idea about your
> code below.
>
> Btw, I don't understand what's the point of pointing out "64bit value passed to
> vcpu's RCX ...". You can add this to the comment instead. If it is important,
> then please add more to explain it so people can understand more.
>

RCX and 64bit value doesn't make much sense in the commit message. I dropped
those sentence.

Here is the updated commit message.

KVM: TDX: Do TDX specific vcpu initialization

TD guest vcpu needs TDX specific initialization before running. Repurpose
KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
KVM_TDX_INIT_VCPU, and implement the callback for it.

> > Repurpose KVM_MEMORY_ENCRYPT_OP
> > to vcpu-scope and add new sub-commands KVM_TDX_INIT_VCPU under it for such
> > additional vcpu configuration.
>
> I am not sure using the same command for both per-VM and per-vcpu ioctls is a
> good idea. Is there any existing example does this?

There are some. Please break Documentation/virt/kvm/api.rst with "type:".
You can see some ioctl supports multiple type.

Type: system ioctl, vm ioctl
Type: vcpu ioctl / vm ioctl
Type: device ioctl, vm ioctl, vcpu ioctl


> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 099f0737a5aa..e2f5a07ad4e5 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
> > return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
> > }
> >
> > +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> > +{
> > + return tdx->tdvpr_pa;
> > +}
> > +
> > static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
> > {
> > return kvm_tdx->tdr_pa;
> > @@ -65,6 +70,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
> > return kvm_tdx->hkid > 0;
> > }
> >
> > +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> > +{
> > + return kvm_tdx->finalized;
> > +}
> > +
> > static void tdx_clear_page(unsigned long page_pa)
> > {
> > const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > @@ -327,7 +337,21 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> >
> > void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> > {
> > - /* This is stub for now. More logic will come. */
> > + struct vcpu_tdx *tdx = to_tdx(vcpu);
> > + int i;
> > +
> > + /* Can't reclaim or free pages if teardown failed. */
> > + if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
> > + return;
>
> You may want to WARN() if it's a kernel bug you want to catch.

No, it's not a bug to come here with hkid freed because vcpus_free method
can be called after vm destruction.


> > +
> > + if (tdx->tdvpx_pa) {
> > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> > + tdx_reclaim_td_page(tdx->tdvpx_pa[i]);
> > + kfree(tdx->tdvpx_pa);
> > + tdx->tdvpx_pa = NULL;
> > + }
> > + tdx_reclaim_td_page(tdx->tdvpr_pa);
> > + tdx->tdvpr_pa = 0;
> > }
> >
> > void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > @@ -337,6 +361,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > /* TDX doesn't support INIT event. */
> > if (WARN_ON_ONCE(init_event))
> > goto td_bugged;
> > + if (WARN_ON_ONCE(is_td_vcpu_created(to_tdx(vcpu))))
> > + goto td_bugged;
>
> Again, not sure can we use KVM_BUG_ON()?

I converted it into KVM_BUG_ON()


> > /* TDX rquires X2APIC. */
> > apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
> > @@ -791,6 +817,125 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> > return r;
> > }
> >
> > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> > +{
> > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > + struct vcpu_tdx *tdx = to_tdx(vcpu);
> > + unsigned long *tdvpx_pa = NULL;
> > + unsigned long tdvpr_pa;
> > + unsigned long va;
> > + int ret, i;
> > + u64 err;
> > +
> > + if (is_td_vcpu_created(tdx))
> > + return -EINVAL;
>
> Ditto. WARN()?

No. KVM_TDX_INIT_VCPU can be called multiple times. It's not kernel bug, but
misuse of the ioctl.


> > +
> > + va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > + if (!va)
> > + return -ENOMEM;
> > + tdvpr_pa = __pa(va);
> > +
> > + tdvpx_pa = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx_pa),
> > + GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>
> kcalloc() uses __GFP_ZERO internally.
>
> > + if (!tdvpx_pa) {
> > + ret = -ENOMEM;
> > + goto free_tdvpr;
> > + }
> > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > + va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > + if (!va)
> > + goto free_tdvpx;
> > + tdvpx_pa[i] = __pa(va);
> > + }
> > +
> > + err = tdh_vp_create(kvm_tdx->tdr_pa, tdvpr_pa);
> > + if (WARN_ON_ONCE(err)) {
> > + ret = -EIO;
> > + pr_tdx_error(TDH_VP_CREATE, err, NULL);
> > + goto td_bugged_free_tdvpx;
> > + }
> > + tdx->tdvpr_pa = tdvpr_pa;
> > +
> > + tdx->tdvpx_pa = tdvpx_pa;
> > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > + err = tdh_vp_addcx(tdx->tdvpr_pa, tdvpx_pa[i]);
> > + if (WARN_ON_ONCE(err)) {
> > + ret = -EIO;
> > + pr_tdx_error(TDH_VP_ADDCX, err, NULL);
> > + for (; i < tdx_caps.tdvpx_nr_pages; i++) {
> > + free_page((unsigned long)__va(tdvpx_pa[i]));
> > + tdvpx_pa[i] = 0;
> > + }
> > + goto td_bugged;
> > + }
> > + }
> > +
> > + err = tdh_vp_init(tdx->tdvpr_pa, vcpu_rcx);
> > + if (WARN_ON_ONCE(err)) {
> > + ret = -EIO;
> > + pr_tdx_error(TDH_VP_INIT, err, NULL);
> > + goto td_bugged;
> > + }
> > +
> > + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > +
> > + return 0;
> > +
> > +td_bugged_free_tdvpx:
> > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > + free_page((unsigned long)__va(tdvpx_pa[i]));
> > + tdvpx_pa[i] = 0;
> > + }
> > + kfree(tdvpx_pa);
> > +td_bugged:
> > + vcpu->kvm->vm_bugged = true;
> > + return ret;
> > +
> > +free_tdvpx:
> > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> > + if (tdvpx_pa[i])
> > + free_page((unsigned long)__va(tdvpx_pa[i]));
> > + kfree(tdvpx_pa);
>
> This piece of code appears 3 times in this function (and there are 3 'return
> ret;'). I am sure it can be done in one place instead. Can you reorganize?

I improved the logic to delete the duplication.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>