Re: [PATCH 2/2] kvm/x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally

From: Tao Xu
Date: Thu May 21 2020 - 02:44:31 EST




On 5/21/2020 2:37 PM, Xiaoyao Li wrote:
On 5/21/2020 1:28 PM, Tao Xu wrote:


On 5/21/2020 12:33 PM, Xiaoyao Li wrote:
On 5/21/2020 5:05 AM, Paolo Bonzini wrote:
On 20/05/20 18:07, Maxim Levitsky wrote:
This msr is only available when the host supports WAITPKG feature.

This breaks a nested guest, if the L1 hypervisor is set to ignore
unknown msrs, because the only other safety check that the
kernel does is that it attempts to read the msr and
rejects it if it gets an exception.

Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
---
 arch/x86/kvm/x86.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe3a24fd6b263..9c507b32b1b77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void)
ÂÂÂÂÂÂÂÂÂÂÂÂÂ if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
+ÂÂÂÂÂÂÂÂÂÂÂ break;
+ÂÂÂÂÂÂÂ case MSR_IA32_UMWAIT_CONTROL:
+ÂÂÂÂÂÂÂÂÂÂÂ if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
ÂÂÂÂÂÂÂÂÂ default:
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂ }

The patch is correct, and matches what is done for the other entries of
msrs_to_save_all. However, while looking at it I noticed that
X86_FEATURE_WAITPKG is actually never added, and that is because it was
also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86:
Add support for user wait instructions", 2019-09-24), which was before
the kvm_cpu_cap mechanism was added.

So while at it you should also fix that. The right way to do that is to
add a

ÂÂÂÂÂÂÂÂ if (vmx_waitpkg_supported())
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);

+ Tao

I remember there is certainly some reason why we don't expose WAITPKG to guest by default.

Tao, please help clarify it.

Thanks,
-Xiaoyao


Because in VM, umwait and tpause can put a (psysical) CPU into a power saving state. So from host view, this cpu will be 100% usage by VM. Although umwait and tpause just cause short wait(maybe 100 microseconds), we still want to unconditionally expose WAITPKG in VM.

I guess you typed "unconditionally" by mistake that you meant to say "conditionally" in fact?

I am sorry, I mean:
By default, we don't expose WAITPKG to guest. For QEMU, we can use "-overcommit cpu-pm=on" to use WAITPKG.