Re: [PATCH 1/2] KVM: X86: Move ignore_msrs handling upper the stack

From: Sean Christopherson
Date: Thu Jun 25 2020 - 13:45:41 EST


On Thu, Jun 25, 2020 at 09:25:40AM -0700, Sean Christopherson wrote:
> On Thu, Jun 25, 2020 at 10:09:13AM +0200, Paolo Bonzini wrote:
> > On 25/06/20 08:15, Sean Christopherson wrote:
> > > IMO, kvm_cpuid() is simply buggy. If KVM attempts to access a non-existent
> > > MSR then it darn well should warn.
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 8a294f9747aa..7ef7283011d6 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> > > *ebx = entry->ebx;
> > > *ecx = entry->ecx;
> > > *edx = entry->edx;
> > > - if (function == 7 && index == 0) {
> > > + if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) &&
> > > + (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) {
> > > u64 data;
> > > if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> > > (data & TSX_CTRL_CPUID_CLEAR))
> > >
> >
> > That works too, but I disagree that warning is the correct behavior
> > here. It certainly should warn as long as kvm_get_msr blindly returns
> > zero. However, for a guest it's fine to access a potentially
> > non-existent MSR if you're ready to trap the #GP, and the point of this
> > series is to let cpuid.c or any other KVM code do the same.
>
> I get the "what" of the change, and even the "why" to some extent, but I
> dislike the idea of supporting/encouraging blind reads/writes to MSRs.
> Blind writes are just asking for problems, and suppressing warnings on reads
> is almost guaranteed to be suppressing a KVM bug.
>
> Case in point, looking at the TSX thing again, I actually think the fix
> should be:
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5eb618dbf211..64322446e590 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> *ebx = entry->ebx;
> *ecx = entry->ecx;
> *edx = entry->edx;
> - if (function == 7 && index == 0) {
> + if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
> u64 data;
> - if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> + if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
> (data & TSX_CTRL_CPUID_CLEAR))
> *ebx &= ~(F(RTM) | F(HLE));
> }
>
>
> On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
> regardless of whether or not it is being advertised to userspace (this is
> a bug in its own right). Using the host_initiated variant means KVM will
> incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
> clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
> without advertising it to the guest.

Argh, belatedly realized that MSR_IA32_TSX_CTRL needs to be swapped even
when ARCH_CAP_TSX_CTRL_MSR isn't exposed to the guest, but if and only if
if TSX is disabled in the host _and_ enabled in the guest. So triggering
setup_msrs() on ARCH_CAP_TSX_CTRL_MSR is insufficient, but I believe we can
and should redo setup_msrs() during vmx_cpuid_update(). I'm pretty sure
that's needed for MSR_TSC_AUX+RDTSCP as well. I suspect RDTSCP is broken
on 32-bit guests, but no has noticed because Linux only employs RDTSCP on
64-bit kernels, and 32-bit guests are exactly common in the first place.

I'll check the above to confirm and prep some patches if RDTSCP is indeed
busted.

> In short, the whole MSR_IA32_TSX_CTRL implementation seems messy and this
> is just papering over that mess. The correct fix is to invoke setup_msrs()
> on writes to MSR_IA32_ARCH_CAPABILITIES, filtering MSR_IA32_TSX_CTRL out of
> shared MSRs when it's not advertised, and change kvm_cpuid() to use the
> unpriveleged variant.
>
> TSC_CTRL aside, if we insist on pointing a gun at our foot at some point,
> this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated,
> so that it at least requires intentionally loading the gun.