Re: [PATCH 4.19 167/211] KVM: x86: Manually calculate reserved bits when loading PDPTRS

From: Sean Christopherson
Date: Mon Nov 11 2019 - 12:38:00 EST


On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote:
> On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote:
> > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> >
> > commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
> >
> > Manually generate the PDPTR reserved bit mask when explicitly loading
> > PDPTRs. The reserved bits that are being tracked by the MMU reflect the
> It seems that a backport of this to stable and distro kernels tickled out
> some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old
> Intel CPUs[1].

It manifests specifically when running with EPT disabled (no surprise
there). Actually, it probably would reproduce simply with unrestricted
guest disabled, but that's beside the point.

The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM
to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky
little bugger because the "if (is_long_mode() ..." makes it appear to be
correct at first glance.

if (is_long_mode(vcpu) &&
(cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
return 1;
else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode()
!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
return 1;

With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't
trigger the buggy code. This doesn't fail upstream because the offending
code was refactored to encapsulate the PAE checks in a single helper,
precisely to avoid this type of headache.

commit bf03d4f9334728bf7c8ffc7de787df48abd6340e
Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Thu Jun 6 18:52:44 2019 +0200

KVM: x86: introduce is_pae_paging

Checking for 32-bit PAE is quite common around code that fiddles with
the PDPTRs. Add a function to compress all checks into a single
invocation.


Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply
cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But,
the revelant changes in x86.c do apply cleanly, and I've quadruple checked
that the PAE checks in vmx.c are correct, i.e. applying the patch and
ignoring the nested.c/vmx.c conflicts would be a viable lazy option.

> Basically, booting this kernel as host, then running an KVM guest distro
> or kernel fails it that guest kernel early in the boot phase without any
> error or other log to serial console, earlyprintk.

...

>
> [0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441
> [1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so
> westmere, conroe and that stuff. AFAICT anything from about pre-2010 which
> has VMX support (i.e. is 64bit based)

Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is
Harpertown, a.k.a. Penryn, the shrink of Conroe.