Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition

From: Vitaly Kuznetsov
Date: Mon Aug 22 2022 - 12:22:02 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>>
>> > On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
>> >> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>> >>
>> >> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
>> >> >> + * published TLFS version. When the bit is set, nested hypervisor can use
>> >> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
>> >> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
>> >> >> + * specification).
>> >> >> + */
>> >> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE BIT(0)
>> >> >
>> >> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl. Are
>> >> > we expecting an update to the TLFS?
>> >> >
>> >> > Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
>> >> > in the enlightened VMCS.
>> >> >
>> >> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
>> >> >
>> >>
>> >> Oh well, better this than nothing. I'll ping the people who told me
>> >> about this bit that their description is incomplete.
>> >
>> > Not that it changes anything, but I'd rather have no documentation. I'd much rather
>> > KVM say "this is the undocumented behavior" than "the document behavior is wrong".
>> >
>>
>> So I reached out to Microsoft and their answer was that for all these new
>> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
>> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
>> bug (if we expose the feature in VMX feature MSRs but don't set
>> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
>
> I.e. TSC_SCALING shouldn't be gated on the flag? If so, then the 2-D array approach
> is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
> expecting any more flags in the future.
>

Unfortunately, we have to gate the presence of these new features on
something, otherwise VMM has no way to specify which particular eVMCS
"revision" it wants (TL;DR: we will break migration).

My initial implementation was inventing 'eVMCS revision' concept:
https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@xxxxxxxxxx/

which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).

Going forward, we will still (likely) need something when new fields show up.

> What about this for an implementation?
>
> static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
> {
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>
> /*
> * Filtering VMX controls for eVMCS compatibility should only be done
> * for guest accesses, and all such accesses should be gated on Hyper-V
> * being enabled and initialized.
> */
> if (WARN_ON_ONCE(!hv_vcpu))
> return false;
>
> return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
> }
>
> static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index)
> {
> u32 unsupported_ctrls;
>
> switch (msr_index) {
> case MSR_IA32_VMX_EXIT_CTLS:
> case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> if (!evmcs_has_perf_global_ctrl(vcpu))
> unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> return unsupported_ctrls;
> case MSR_IA32_VMX_ENTRY_CTLS:
> case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> if (!evmcs_has_perf_global_ctrl(vcpu))
> unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> return unsupported_ctrls;
> case MSR_IA32_VMX_PROCBASED_CTLS2:
> return EVMCS1_UNSUPPORTED_2NDEXEC;
> case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> case MSR_IA32_VMX_PINBASED_CTLS:
> return EVMCS1_UNSUPPORTED_PINCTRL;
> case MSR_IA32_VMX_VMFUNC:
> return EVMCS1_UNSUPPORTED_VMFUNC;
> default:
> KVM_BUG_ON(1, vcpu->kvm);
> return 0;
> }
> }
>
> void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> {
> u64 unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_index);
>
> if (msr_index == MSR_IA32_VMX_VMFUNC)
> *pdata &= ~unsupported_ctrls;
> else
> *pdata &= ~(unsupported_ctrls << 32);
> }
>

It's smaller and I like it but it would only work in conjunction with
KVM_CAP_HYPERV_ENLIGHTENED_VMCS2...

>
>> What I'm still concerned about is future proofing KVM for new
>> features. When something is getting added to KVM for which no eVMCS
>> field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases
>> should be taken care of. It would probably be better to reverse our
>> filtering, explicitly listing features supported in eVMCS. The lists are
>> going to be fairly long but at least we won't have to take care of any
>> new architectural feature added to KVM.
>
> Having the filtering be opt-in crossed my mind as well. Reversing the filtering
> can be done after this series though, correct?
>

Yes, that's my plan, Get this in to fix the immediate issue with 2022
features and probably reverse the filtering before Microsoft releases
something else :-)

--
Vitaly