Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

From: Huang, Kai
Date: Mon Apr 22 2024 - 21:34:33 EST



>
> > > And the intent isn't to catch every possible problem. As with many sanity checks,
> > > the intent is to detect the most likely failure mode to make triaging and debugging
> > > issues a bit easier.
> >
> > The SEAMCALL will literally return a unique error code to indicate CPU
> > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done. I think the
> > error code is already clear to pinpoint the problem (due to these pre-
> > SEAMCALL-condition not being met).
>
> No, SEAMCALL #UDs if the CPU isn't post-VMXON. I.e. the CPU doesn't make it to
> the TDX Module to provide a unique error code, all KVM will see is a #UD.

#UD is handled by the SEAMCALL assembly code. Please see TDX_MODULE_CALL
assembly macro:

.Lseamcall_trap\@:
/*
* SEAMCALL caused #GP or #UD. By reaching here RAX contains
* the trap number. Convert the trap number to the TDX error
* code by setting TDX_SW_ERROR to the high 32-bits of RAX.
*
* Note cannot OR TDX_SW_ERROR directly to RAX as OR instruction
* only accepts 32-bit immediate at most.
*/
movq $TDX_SW_ERROR, %rdi
orq %rdi, %rax

...
 
_ASM_EXTABLE_FAULT(.Lseamcall\@, .Lseamcall_trap\@)
.endif /* \host */

>
> > > > Btw, I noticed there's another problem, that is currently tdx_cpu_enable()
> > > > actually requires IRQ being disabled. Again it was implemented based on
> > > > it would be invoked via both on_each_cpu() and kvm_online_cpu().
> > > >
> > > > It also also implemented with consideration that it could be called by
> > > > multiple in-kernel TDX users in parallel via both SMP call and in normal
> > > > context, so it was implemented to simply request the caller to make sure
> > > > it is called with IRQ disabled so it can be IRQ safe (it uses a percpu
> > > > variable to track whether TDH.SYS.LP.INIT has been done for local cpu
> > > > similar to the hardware_enabled percpu variable).
> > >
> > > Is this is an actual problem, or is it just something that would need to be
> > > updated in the TDX code to handle the change in direction?
> >
> > For now this isn't, because KVM is the solo user, and in KVM
> > hardware_enable_all() and kvm_online_cpu() uses kvm_lock mutex to make
> > hardware_enable_nolock() IPI safe.
> >
> > I am not sure how TDX/SEAMCALL will be used in TDX Connect.
> >
> > However I needed to consider KVM as a user, so I decided to just make it
> > must be called with IRQ disabled so I could know it is IRQ safe.
> >
> > Back to the current tdx_enable() and tdx_cpu_enable(), my personal
> > preference is, of course, to keep the existing way, that is:
> >
> > During module load:
> >
> > cpus_read_lock();
> > tdx_enable();
> > cpus_read_unlock();
> >
> > and in kvm_online_cpu():
> >
> > local_irq_save();
> > tdx_cpu_enable();
> > local_irq_restore();
> >
> > But given KVM is the solo user now, I am also fine to change if you
> > believe this is not acceptable.
>
> Looking more closely at the code, tdx_enable() needs to be called under
> cpu_hotplug_lock to prevent *unplug*, i.e. to prevent the last CPU on a package
> from being offlined. I.e. that part's not option.

Yeah. We can say that. I almost forgot this :-)

>
> And the root of the problem/confusion is that the APIs provided by the core kernel
> are weird, which is really just a polite way of saying they are awful :-)

Well, apologize for it :-)

>
> There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely
> no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking
> tdx_enable(). I suspect they got that way because of KVM's unnecessarily complex
> code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough
> to also do TDH_SYS_LP_INIT, so why do two IPIs?

The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX
module initialization.  

Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the
platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before
any other SEAMCALLs. This didn't quite work with (kernel software) CPU
hotplug, and it had problem dealing with things like SMT disable
mitigation:

https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f

So the x86 maintainers requested to change this. The original proposal
was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT:

https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469

But somehow it wasn't feasible, and the result was we relaxed to allow
TDH.SYS.LP.INIT to be called after module initialization.

So we need a separate tdx_cpu_enable() for that.

>
> But just because KVM "owns" VMXON doesn't mean the core kernel code should punt
> TDX to KVM too. If KVM relies on the cpuhp code to ensure all online CPUs are
> post-VMXON, then the TDX shapes up nicely and provides a single API to enable
> TDX.  
>

We could ask tdx_enable() to invoke tdx_cpu_enable() internally, but as
mentioned above we still to have the tdx_cpu_enable() as a separate API to
allow CPU hotplug to call it.

> And then my CR4.VMXE=1 sanity check makes a _lot_ more sense.

As replied above the current SEAMCALL assembly returns a unique error code
for that:

#define TDX_SEAMCALL_UD (TDX_SW_ERROR |
X86_TRAP_UD)