Re: [PATCH v7 5/8] KVM: MMU: Add helper function to get pkr bits

From: Sean Christopherson
Date: Tue May 24 2022 - 19:22:07 EST


On Sun, Apr 24, 2022, Lei Wang wrote:
> Extra the PKR stuff to a separate, non-inline helper, which is a

s/Extra/Extract

> preparation to introduce pks support.

Please provide more justification. The change is justified, by random readers of
this patch/commit will be clueless.

Extract getting the effective PKR bits to a helper that lives in mmu.c
in order to keep the is_cr4_*() helpers contained to mmu.c. Support for
PKRS (versus just PKRU) will require querying MMU state to see if the
relevant CR4 bit is enabled because pkr_mask will be non-zero if _either_
bit is enabled).

PKR{U,S} are exposed to the guest if and only if TDP is enabled, and
while permission_fault() is performance critical for ia32 shadow paging,
it's a rarely used path with TDP is enabled. I.e. moving the PKR code
out-of-line is not a performance concern.

> Signed-off-by: Lei Wang <lei4.wang@xxxxxxxxx>
> ---
> arch/x86/kvm/mmu.h | 20 +++++---------------
> arch/x86/kvm/mmu/mmu.c | 21 +++++++++++++++++++++
> 2 files changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index cb3f07e63778..cea03053a153 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -204,6 +204,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> return vcpu->arch.mmu->page_fault(vcpu, &fault);
> }
>
> +u32 kvm_mmu_pkr_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

kvm_mmu_get_pkr_bits() so that there's a verb in there.

> + unsigned pte_access, unsigned pte_pkey, unsigned int pfec);
> +
> /*
> * Check if a given access (described through the I/D, W/R and U/S bits of a
> * page fault error code pfec) causes a permission fault with the given PTE
> @@ -240,21 +243,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>
> WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
> if (unlikely(mmu->pkr_mask)) {
> - u32 pkr_bits, offset;
> -
> - /*
> - * PKRU defines 32 bits, there are 16 domains and 2
> - * attribute bits per domain in pkru. pte_pkey is the
> - * index of the protection domain, so pte_pkey * 2 is
> - * is the index of the first bit for the domain.
> - */
> - pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
> -
> - /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
> - offset = (pfec & ~1) +
> - ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
> -
> - pkr_bits &= mmu->pkr_mask >> offset;
> + u32 pkr_bits =
> + kvm_mmu_pkr_bits(vcpu, mmu, pte_access, pte_pkey, pfec);

Nit, I prefer wrapping in the params, that way the first line shows the most
important information, e.g. what variable is being set and how (by a function call).
And then there won't be overflow with the longer helper name:

u32 pkr_bits = kvm_mmu_get_pkr_bits(vcpu, mmu, pte_access,
pte_pkey, pfec);

> errcode |= -pkr_bits & PFERR_PK_MASK;
> fault |= (pkr_bits != 0);
> }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index de665361548d..6d3276986102 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6477,3 +6477,24 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> if (kvm->arch.nx_lpage_recovery_thread)
> kthread_stop(kvm->arch.nx_lpage_recovery_thread);
> }
> +
> +u32 kvm_mmu_pkr_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> + unsigned pte_access, unsigned pte_pkey, unsigned int pfec)
> +{
> + u32 pkr_bits, offset;
> +
> + /*
> + * PKRU defines 32 bits, there are 16 domains and 2

Comment needs to be aligned, and it can be adjust to wrap at 80 chars (its
indentation has changed).


/*
* PKRU and PKRS both define 32 bits. There are 16 domains and 2
* attribute bits per domain in them. pte_key is the index of the
* protection domain, so pte_pkey * 2 is the index of the first bit for
* the domain. The use of PKRU versus PKRS is selected by the address
* type, as determined by the U/S bit in the paging-structure entries.
*/

> + * attribute bits per domain in pkru. pte_pkey is the
> + * index of the protection domain, so pte_pkey * 2 is
> + * is the index of the first bit for the domain.
> + */
> + pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
> +
> + /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
> + offset = (pfec & ~1) + ((pte_access & PT_USER_MASK)
> + << (PFERR_RSVD_BIT - PT_USER_SHIFT));
> +
> + pkr_bits &= mmu->pkr_mask >> offset;
> + return pkr_bits;
> +}
> --
> 2.25.1
>