Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance degradation

From: guanghui.fgh
Date: Mon Jul 04 2022 - 10:34:25 EST


Thanks.

在 2022/7/4 22:23, Will Deacon 写道:
On Mon, Jul 04, 2022 at 10:11:27PM +0800, guanghui.fgh wrote:
在 2022/7/4 21:15, Will Deacon 写道:
On Mon, Jul 04, 2022 at 08:05:59PM +0800, guanghui.fgh wrote:
1.Quoted messages from arch/arm64/mm/init.c

"Memory reservation for crash kernel either done early or deferred
depending on DMA memory zones configs (ZONE_DMA) --

In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
here instead of max_zone_phys(). This lets early reservation of
crash kernel memory which has a dependency on arm64_dma_phys_limit.
Reserving memory early for crash kernel allows linear creation of block
mappings (greater than page-granularity) for all the memory bank rangs.
In this scheme a comparatively quicker boot is observed.

If ZONE_DMA configs are defined, crash kernel memory reservation
is delayed until DMA zone memory range size initialization performed in
zone_sizes_init(). The defer is necessary to steer clear of DMA zone
memory range to avoid overlap allocation.

[[[
So crash kernel memory boundaries are not known when mapping all bank memory
ranges, which otherwise means not possible to exclude crash kernel range
from creating block mappings so page-granularity mappings are created for
the entire memory range.
]]]"

Namely, the init order: memblock init--->linear mem mapping(4k mapping for
crashkernel, requirinig page-granularity changing))--->zone dma
limit--->reserve crashkernel.
So when enable ZONE DMA and using crashkernel, the mem mapping using 4k
mapping.

Yes, I understand that is how things work today but I'm saying that we may
as well leave the crashkernel mapped (at block granularity) if
!can_set_direct_map() and then I think your patch becomes a lot simpler.

But Page-granularity mapppings are necessary for crash kernel memory range
for shrinking its size via /sys/kernel/kexec_crash_size interfac(Quoted from
arch/arm64/mm/init.c).
So this patch split block/section mapping to 4k page-granularity mapping for
crashkernel mem.

Why? I don't see why the mapping granularity is relevant at all if we
always leave the whole thing mapped.

There is another reason.

When loading crashkernel finish, the do_kexec_load will use
arch_kexec_protect_crashkres to invalid all the pagetable for crashkernel
mem(protect crashkernel mem from access).

arch_kexec_protect_crashkres--->set_memory_valid--->...--->apply_to_pmd_range

In the apply_to_pmd_range, there is a judement: BUG_ON(pud_huge(*pud)). And
if the crashkernel use block/section mapping, there will be some error.

Namely, it's need to use non block/section mapping for crashkernel mem
before shringking.

Well, yes, but we can change arch_kexec_[un]protect_crashkres() not to do
that if we're leaving the thing mapped, no?

Will

I think we should use arch_kexec_[un]protect_crashkres for crashkernel mem.

Because when invalid crashkernel mem pagetable, there is no chance to rd/wr the crashkernel mem by mistake.

If we don't use arch_kexec_[un]protect_crashkres to invalid crashkernel mem pagetable, there maybe some write operations to these mem by mistake which may cause crashkernel boot error and vmcore saving error.

Can we change the arch_kexec_[un]protect_crashkres to support block/section mapping?(But we also need to remap when shrinking)

Thanks.