Re: [PATCH v3 2/3] arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG

From: Ryan Roberts
Date: Thu May 02 2024 - 04:20:16 EST


On 02/05/2024 07:12, Anshuman Khandual wrote:
>
>
> On 5/1/24 20:24, Ryan Roberts wrote:
>> PTE_PRESENT_INVALID was previously occupying bit 59, which when a PTE is
>> valid can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the
>> HW configuration. In practice this is currently not a problem because
>> PTE_PRESENT_INVALID can only be 1 when PTE_VALID=0 and upstream Linux
>> always requires the bit set to 0 for a valid pte.
>>
>> However, if in future Linux wants to use the field (e.g. AttrIndex[3])
>> then we could end up with confusion when PTE_PRESENT_INVALID comes along
>> and corrupts the field - we would ideally want to preserve it even for
>> an invalid (but present) pte.
>>
>> The other problem with bit 59 is that it prevents the offset field of a
>> swap entry within a swap pte from growing beyond 51 bits. By moving
>> PTE_PRESENT_INVALID to a low bit we can lay the swap pte out so that the
>> offset field could grow to 52 bits in future.
>>
>> So let's move PTE_PRESENT_INVALID to overlay PTE_NG (bit 11).
>>
>> There is no need to persist NG for a present-invalid entry; it is always
>> set for user mappings and is not used by SW to derive any state from the
>
> But the idea of present and invalid state is that all the HW used information
> should be fetched successfully even though the the entry is not valid and not
> being walked by the MMU. Setting and clearing PTE SW bits in such state, does
> not change that, but tampering with HW bits would break the assumption around
> present and invalid entry ?

But we are not changing anything; NG is *always* set in valid-present user
entries. And it continues to be *always* set in invalid-present user entries,
because PTE_PRESENT_INVALID is the same bit.

Additionally there are no helpers (that I could find) that read or write the NG
bit so core-mm can't possibly be accidentally corrupting anything or reading
back incorrect state.

>
>> pte. PTE_NS was considered instead of PTE_NG, but it is RES0 for
>> non-secure SW, so there is a chance that future architecture may
>> allocate the bit and we may therefore need to persist that bit for
>> present-invalid ptes.
>
> If we are being careful around PTE_NS and even for AttrIndex[3] as mentioned
> earlier to be useful during an invalid state, how can PTE_NG be used without
> any such consideration.

Are you suggesting that in future we may want to make global mappings available
to user space? I think that's unlikely?

The point about NG is that it is already architecturally defined in normal world
and we are using it - kernel mappings are (usually) global, user mappings are
not. So a new meaning for that bit can't come along unless we are willing to say
that *all* mappings (inc kernel mappings) will become implicitly non-global.

So my assertion that this bit can be reused like this (now and in future) is
predicated on 1) all user mappings are and will continue to be non-global, 2)
the bit will never be repurposed by the architecture (or if it is, Linux will
never opt-in because we need per-pte (non-)global control).

>
>>
>> These are both marginal benefits, but make things a bit tidier in my
>> opinion.
>
> Apart from swap offset field expansion does this change achieve anything else ?

It frees up bit 59 to be used for its architectural purpose in future.

>
>>
>> Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>
>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
>> ---
>> arch/arm64/include/asm/pgtable-prot.h | 2 +-
>> arch/arm64/include/asm/pgtable.h | 12 ++++++------
>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>> index cd8c06f5fb02..3047d10987fd 100644
>> --- a/arch/arm64/include/asm/pgtable-prot.h
>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>> @@ -18,7 +18,7 @@
>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>> -#define PTE_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>> +#define PTE_PRESENT_INVALID (PTE_NG) /* only when !PTE_VALID */
>>
>> #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>> #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index c0f4471423db..7f1ff59c43ed 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1254,15 +1254,15 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>> * Encode and decode a swap entry:
>> * bits 0-1: present (must be zero)
>> * bits 2: remember PG_anon_exclusive
>> - * bits 3-7: swap type
>> - * bits 8-57: swap offset
>> - * bit 59: PTE_PRESENT_INVALID (must be zero)
>> + * bits 6-10: swap type
>> + * bit 11: PTE_PRESENT_INVALID (must be zero)
>> + * bits 12-61: swap offset
>> */
>> -#define __SWP_TYPE_SHIFT 3
>> +#define __SWP_TYPE_SHIFT 6
>> #define __SWP_TYPE_BITS 5
>> -#define __SWP_OFFSET_BITS 50
>> #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
>> -#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
>> +#define __SWP_OFFSET_SHIFT 12
>> +#define __SWP_OFFSET_BITS 50
>> #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)
>>
>> #define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)