Re: [RFC PATCH v5 092/104] KVM: TDX: Handle TDX PV HLT hypercall

From: Paolo Bonzini
Date: Thu Apr 07 2022 - 11:56:14 EST


On 4/7/22 17:02, Sean Christopherson wrote:
On Thu, Apr 07, 2022, Paolo Bonzini wrote:
On 3/4/22 20:49, isaku.yamahata@xxxxxxxxx wrote:
+ bool interrupt_disabled = tdvmcall_p1_read(vcpu);

Where is R12 documented for TDG.VP.VMCALL<Instruction.HLT>?

+ * Virtual interrupt can arrive after TDG.VM.VMCALL<HLT> during
+ * the TDX module executing. On the other hand, KVM doesn't
+ * know if vcpu was executing in the guest TD or the TDX module.

I don't understand this; why isn't it enough to check PI.ON or something
like that as part of HLT emulation?

Ooh, I think I remember what this is. This is for the case where the virtual
interrupt is recognized, i.e. set in vmcs.RVI, between the STI and "HLT". KVM
doesn't have access to RVI and the interrupt is no longer in the PID (because it
was "recognized". It doesn't get delivered in the guest because the TDCALL
completes before interrupts are enabled.

I lobbied to get this fixed in the TDX module by immediately resuming the guest
in this case, but obviously that was unsuccessful.

So the TDX module sets RVI while in an STI interrupt shadow. So far so good. Then:

- it receives the HLT TDCALL from the guest. The interrupt shadow at this point is gone.

- it knows that there is an interrupt that can be delivered (RVI > PPR && EFLAGS.IF=1, the other conditions of 29.2.2 don't matter)

- it forwards the HLT TDCALL nevertheless, to a clueless hypervisor that has no way to glean either RVI or PPR?

It's absurd that this be treated as anything but a bug.


Until that is fixed, KVM needs to do something like:

- every time a bit is set in PID.PIR, set tdx->buggy_hlt_workaround = 1

- every time TDG.VP.VMCALL<HLT> is received, xchg(&tdx->buggy_hlt_workaround, 0) and return immediately to the guest if it is 1.

Basically an internal version of PID.ON.

+ details.full = td_state_non_arch_read64(
+ to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH);

TDX documentation says "the meaning of the field may change with Intel TDX
module version", where is this field documented? I cannot find any "other
guest state" fields in the TDX documentation.

IMO we should put a stake in the ground and refuse to accept code that consumes
"non-architectural" state. It's all software, having non-architectural APIs is
completely ridiculous.

Having them is fine, *using* them to work around undocumented bugs is the ridiculous part.

You didn't answer the other question, which is "Where is R12 documented for TDG.VP.VMCALL<Instruction.HLT>?" though... Should I be worried? :)


Paolo