Re: [PATCH 4.19 167/211] KVM: x86: Manually calculate reserved bits when loading PDPTRS

From: Sean Christopherson
Date: Mon Nov 11 2019 - 12:57:43 EST


On Mon, Nov 11, 2019 at 06:48:59PM +0100, Greg Kroah-Hartman wrote:
> On Mon, Nov 11, 2019 at 09:37:57AM -0800, Sean Christopherson wrote:
> > On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote:
> > > On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote:
> > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > >
> > > > commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
> > > >
> > > > Manually generate the PDPTR reserved bit mask when explicitly loading
> > > > PDPTRs. The reserved bits that are being tracked by the MMU reflect the
> > > It seems that a backport of this to stable and distro kernels tickled out
> > > some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old
> > > Intel CPUs[1].
> >
> > It manifests specifically when running with EPT disabled (no surprise
> > there). Actually, it probably would reproduce simply with unrestricted
> > guest disabled, but that's beside the point.
> >
> > The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM
> > to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky
> > little bugger because the "if (is_long_mode() ..." makes it appear to be
> > correct at first glance.
> >
> > if (is_long_mode(vcpu) &&
> > (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> > return 1;
> > else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode()
> > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> > return 1;
> >
> > With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't
> > trigger the buggy code. This doesn't fail upstream because the offending
> > code was refactored to encapsulate the PAE checks in a single helper,
> > precisely to avoid this type of headache.
> >
> > commit bf03d4f9334728bf7c8ffc7de787df48abd6340e
> > Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Date: Thu Jun 6 18:52:44 2019 +0200
> >
> > KVM: x86: introduce is_pae_paging
> >
> > Checking for 32-bit PAE is quite common around code that fiddles with
> > the PDPTRs. Add a function to compress all checks into a single
> > invocation.
> >
> >
> > Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply
> > cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But,
> > the revelant changes in x86.c do apply cleanly, and I've quadruple checked
> > that the PAE checks in vmx.c are correct, i.e. applying the patch and
> > ignoring the nested.c/vmx.c conflicts would be a viable lazy option.
> >
> > > Basically, booting this kernel as host, then running an KVM guest distro
> > > or kernel fails it that guest kernel early in the boot phase without any
> > > error or other log to serial console, earlyprintk.
> >
> > ...
> >
> > >
> > > [0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441
> > > [1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so
> > > westmere, conroe and that stuff. AFAICT anything from about pre-2010 which
> > > has VMX support (i.e. is 64bit based)
> >
> > Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is
> > Harpertown, a.k.a. Penryn, the shrink of Conroe.
>
>
> Thanks for figuring this out, can you send us a patch that we can apply
> to fix this issue in the stable tree?

Can do. A custom backport will be need for 4.20 and earlier, not 4.19 and
earlier. I misremembered when we did the VMX refactoring.

For 5.0, 5.1 and 5.2, commit bf03d4f93347 can be applied directly.