Re: [PATCH v2 16/31] KVM: nVMX: hyper-v: Direct TLB flush

From: Sean Christopherson
Date: Thu Apr 07 2022 - 14:48:09 EST


On Thu, Apr 07, 2022, Vitaly Kuznetsov wrote:
> Enable Direct TLB flush feature on nVMX when:
> - Enlightened VMCS is in use.
> - Direct TLB flush flag is enabled in eVMCS.
> - Direct TLB flush is enabled in partition assist page.

Yeah, KVM definitely needs a different name for "Direct TLB flush". I don't have
any good ideas offhand, but honestly anything is better than "Direct".

> Perform synthetic vmexit to L1 after processing TLB flush call upon
> request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> ---

...

> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 8862692a4c5d..ab0949c22d2d 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -65,6 +65,8 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
> #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
> #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
>
> +#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031

LOL, I guess I have to appreciate the cleverness. Bit 28 is cleared for all
exits except when using an SMI transfer monitor, and then it's set only if MTF
is pending.

The remainder of the field (bits 31:28 and bits 26:16) is cleared to 0 (certain
SMM VM exits may set some of these bits; see Section 31.15.2.3).

If the SMM VM exit occurred in VMX non-root operation and an MTF VM exit was
pending, bit 28 of the exit-reason field is set; otherwise, it is cleared.

So despite all appearances, Microsoft didn't actually steal a bit from Intel,
they're just abusing a bit that (a) will never be set so long as the VMM doesn't
use parallel SMM and (b) architecturally can't be set in conjuction with many
exit reasons (everything that's _not_ some form of SMI).

Can you add a comment note to document this?

/*
* Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
* pairing it with architecturally impossible exit reasons. Bit 28 is set only
* on SMI exits to a SMI tranfer monitor (STM) and if and only if a MTF VM-Exit
* is pending. I.e. it will never be set by hardware for non-SMI exits (there
* are only three), nor will it ever be set unless the VMM is an STM.
*/

> struct evmcs_field {
> u16 offset;
> u16 clean_field;
> @@ -244,6 +246,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> uint16_t *vmcs_version);
> void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
> int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
> +bool nested_evmcs_direct_flush_enabled(struct kvm_vcpu *vcpu);