Re: [RFC PATCH V3 2/4] KVM: X86: Introduce role.glevel for level expanded pagetable

From: Sean Christopherson
Date: Tue Apr 12 2022 - 19:16:05 EST


On Wed, Mar 30, 2022, Lai Jiangshan wrote:
> + role.glevel:
> + The level in guest pagetable if the sp is indirect. Is 0 if the sp
> + is direct without corresponding guest pagetable, like TDP or !CR0.PG.
> + When role.level > guest paging level, indirect sp is created on the
> + top with role.glevel = guest paging level and acks as passthrough sp

s/acks/acts

> + and its contents are specially installed rather than the translations
> + of the corresponding guest pagetable.
> gfn:
> Either the guest page table containing the translations shadowed by this
> page, or the base page frame for linear translations. See role.direct.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9694dd5e6ccc..67e1bccaf472 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -314,7 +314,7 @@ struct kvm_kernel_irq_routing_entry;
> * cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> *
> * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> + * single gfn is a bit less than 2^15.
> */
> union kvm_mmu_page_role {
> u32 word;
> @@ -331,7 +331,8 @@ union kvm_mmu_page_role {
> unsigned smap_andnot_wp:1;
> unsigned ad_disabled:1;
> unsigned guest_mode:1;
> - unsigned :6;
> + unsigned glevel:4;

We don't need 4 bits for this. Crossing our fingers that we never had to shadow
a 2-level guest with a 6-level host, we can do:

unsigned passthrough_delta:2;

Where the field is ignored if direct=1, '0' for non-passthrough, and 1-3 to handle
shadow_root_level - guest_root_level. Basically the same idea as Paolo's smushing
of direct+passthrough into mapping_level, just dressed up differently.

Side topic, we should steal a bit back from "level", or at least document that we
can steal a bit if necessary.

> + unsigned :2;
>
> /*
> * This is left at the top of the word so that
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 02eae110cbe1..d53037df8177 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -737,8 +737,12 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
>
> static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
> {
> - if (!sp->role.direct)
> + if (!sp->role.direct) {
> + if (unlikely(sp->role.glevel < sp->role.level))

Regardless of whatever magic we end up using, there should be an is_passthrough_sp()
helper to wrap the magic.

> + return sp->gfn;
> +
> return sp->gfns[index];
> + }
>
> return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
> }