Re: [RFC PATCH 5/6] KVM: X86: Alloc pae_root shadow page

From: Lai Jiangshan
Date: Thu Jan 06 2022 - 23:37:08 EST




On 2022/1/7 03:41, Sean Christopherson wrote:
On Thu, Jan 06, 2022, Lai Jiangshan wrote:


On 2022/1/6 00:45, Sean Christopherson wrote:
On Wed, Jan 05, 2022, Lai Jiangshan wrote:
On Wed, Jan 5, 2022 at 5:54 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:


default_pae_pdpte is needed because the cpu expect PAE pdptes are
present when VMenter.

That's incorrect. Neither Intel nor AMD require PDPTEs to be present. Not present
is perfectly ok, present with reserved bits is what's not allowed.

Intel SDM:
A VM entry that checks the validity of the PDPTEs uses the same checks that are
used when CR3 is loaded with MOV to CR3 when PAE paging is in use[7]. If MOV to CR3
would cause a general-protection exception due to the PDPTEs that would be loaded
(e.g., because a reserved bit is set), the VM entry fails.

7. This implies that (1) bits 11:9 in each PDPTE are ignored; and (2) if bit 0
(present) is clear in one of the PDPTEs, bits 63:1 of that PDPTE are ignored.

But in practice, the VM entry fails if the present bit is not set in the
PDPTE for the linear address being accessed (when EPT enabled at least). The
host kvm complains and dumps the vmcs state.

That doesn't make any sense. If EPT is enabled, KVM should never use a pae_root.
The vmcs.GUEST_PDPTRn fields are in play, but those shouldn't derive from KVM's
shadow page tables.

Oh, I wrote the negative what I want to say again when I try to emphasis
something after I wrote a sentence and modified it several times.

I wanted to mean "EPT not enabled" when vmx.

Heh, that makes a lot more sense.

The VM entry fails when the guest is in very early stage when booting which
might be still in real mode.

VMEXIT: intr_info=00000000 errorcode=0000000 ilen=00000000
reason=80000021 qualification=0000000000000002

Yep, that's the signature for an illegal PDPTE at VM-Enter. But as noted above,
a not-present PDPTE is perfectly legal, VM-Enter should failed if and only if a
PDPTE is present and has reserved bits set.

IDTVectoring: info=00000000 errorcode=00000000


And I doubt there is a VMX ucode bug at play, as KVM currently uses '0' in its
shadow page tables for not-present PDPTEs.

If you can post/provide the patches that lead to VM-Fail, I'd be happy to help
debug.

If you can try this patchset, you can just set the default_pae_pdpte to 0 to test
it.

I can't reproduce the failure with this on top of your series + kvm/queue (commit
cc0e35f9c2d4 ("KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled")).



I can't reproduce the failure with this code base either. And I can't reproduce
the failure when I switch to the code base when I developed it.

After reviewing all the logs I saved that time, I think it was fixed after
make_pae_pdpte(). I should have added make_pae_pdpte() first before added
default_pae_pdpte. (The code was still mess and the guest can't fully
function even when make_pae_pdpte() was added that time)

Removing default_pae_pdpte will simplify the code. Thank you.

Thanks
Lai.