Re: [PATCH 0/1] mm: Optimizing hugepage zeroing in arm64

From: Robin Murphy
Date: Fri Jan 22 2021 - 07:47:33 EST


On 2021-01-22 12:13, Catalin Marinas wrote:
On Thu, Jan 21, 2021 at 06:59:37PM +0000, Robin Murphy wrote:
On 2021-01-21 17:46, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:21:50PM +0530, Prathu Baronia wrote:
This patch removes the unnecessary kmap calls in the hugepage zeroing path and
improves the timing by 62%.

I had proposed a similar change in Apr-May'20 timeframe in memory.c where I
proposed to clear out a hugepage by directly calling a memset over the whole
hugepage but got the opposition that the change was not architecturally neutral.

Upon revisiting this now I see significant improvement by removing around 2k
barrier calls from the zeroing path. So hereby I propose an arm64 specific
definition of clear_user_highpage().

Given that barrier() is purely a thing for the compiler, wouldn't the same
change yield a benefit on any other architecture without HIGHMEM? In which
case, I think this sort of change belongs in the core code if it's actually
worthwhile.

I would have thought it's more the constant manipulation of the preempt and
pagefault counts, rather than the compiler barriers between them, that has
the impact. Either way, if arm64 doesn't need to be atomic WRT preemption
when clearing parts of hugepages then I also can't imagine that anyone else
(at least for !HIGHMEM) would either.

I thought the kmap_local stuff was supposed to fix this unnecessary
preemption disabling on 64-bit architectures:

https://lwn.net/Articles/836144/

I guess it's not there yet.

No, it's there alright - when I pulled up the code to double-check my memory of this area, I did notice the kerneldoc and start wondering if this should simply be using kmap_local_page() for everyone.

Robin.