Re: [PATCH 4.19 167/211] KVM: x86: Manually calculate reserved bits when loading PDPTRS

From: Greg Kroah-Hartman
Date: Mon Nov 11 2019 - 12:49:26 EST


On Mon, Nov 11, 2019 at 09:37:57AM -0800, Sean Christopherson wrote:
> On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote:
> > On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote:
> > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > >
> > > commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
> > >
> > > Manually generate the PDPTR reserved bit mask when explicitly loading
> > > PDPTRs. The reserved bits that are being tracked by the MMU reflect the
> > It seems that a backport of this to stable and distro kernels tickled out
> > some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old
> > Intel CPUs[1].
>
> It manifests specifically when running with EPT disabled (no surprise
> there). Actually, it probably would reproduce simply with unrestricted
> guest disabled, but that's beside the point.
>
> The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM
> to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky
> little bugger because the "if (is_long_mode() ..." makes it appear to be
> correct at first glance.
>
> if (is_long_mode(vcpu) &&
> (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> return 1;
> else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode()
> !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> return 1;
>
> With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't
> trigger the buggy code. This doesn't fail upstream because the offending
> code was refactored to encapsulate the PAE checks in a single helper,
> precisely to avoid this type of headache.
>
> commit bf03d4f9334728bf7c8ffc7de787df48abd6340e
> Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Date: Thu Jun 6 18:52:44 2019 +0200
>
> KVM: x86: introduce is_pae_paging
>
> Checking for 32-bit PAE is quite common around code that fiddles with
> the PDPTRs. Add a function to compress all checks into a single
> invocation.
>
>
> Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply
> cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But,
> the revelant changes in x86.c do apply cleanly, and I've quadruple checked
> that the PAE checks in vmx.c are correct, i.e. applying the patch and
> ignoring the nested.c/vmx.c conflicts would be a viable lazy option.
>
> > Basically, booting this kernel as host, then running an KVM guest distro
> > or kernel fails it that guest kernel early in the boot phase without any
> > error or other log to serial console, earlyprintk.
>
> ...
>
> >
> > [0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441
> > [1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so
> > westmere, conroe and that stuff. AFAICT anything from about pre-2010 which
> > has VMX support (i.e. is 64bit based)
>
> Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is
> Harpertown, a.k.a. Penryn, the shrink of Conroe.


Thanks for figuring this out, can you send us a patch that we can apply
to fix this issue in the stable tree?

greg k-h