Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance degradation

From: guanghui.fgh
Date: Tue Jul 05 2022 - 09:49:06 EST




在 2022/7/5 20:56, Mike Rapoport 写道:
On Tue, Jul 05, 2022 at 01:11:16PM +0100, Will Deacon wrote:
On Tue, Jul 05, 2022 at 08:07:07PM +0800, guanghui.fgh wrote:

1.The rodata full is harm to the performance and has been disabled in-house.

2.When using crashkernel with rodata non full, the kernel also will use non
block/section mapping which cause high d-TLB miss and degrade performance
greatly.
This patch fix it to use block/section mapping as far as possible.

bool can_set_direct_map(void)
{
return rodata_full || debug_pagealloc_enabled();
}

map_mem:
if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;

3.When rodata full is disabled, crashkernel also need protect(keep
arch_kexec_[un]protect_crashkres using).
I think crashkernel should't depend on radata full(Maybe other architecture
don't support radata full now).

I think this is going round in circles :/

As a first step, can we please leave the crashkernel mapped unless
rodata=full? It should be a much simpler patch to write, review and maintain
and it gives you the performance you want when crashkernel is being used.

Since we are talking about large systems, what do you think about letting
them set CRASH_ALIGN to PUD_SIZE, then

unmap(crashkernel);
__create_pgd_mapping(crashkernel, NO_BLOCK_MAPPINGS);

should be enough to make crash kernel mapped with base pages.
Will

Thanks.

1.When kernel boots with crashkernel, the crashkernel parameters format is:
0M-2G:0M,2G-256G:256M,256G-1024G:320M,1024G-:384M which is self-adaption to multiple system.

2.As mentioned above, the crashkernel mem size maybe less than PUD_SIZE(Not multiple time to PUD_SIZE).
So, maybe we also need use some non block/section mappings.