Re: [PATCH v3 1/4] x86/clear_page: extend clear_page*() for multi-page clearing
From: Ankur Arora
Date: Tue Apr 15 2025 - 02:15:15 EST
Mateusz Guzik <mjguzik@xxxxxxxxx> writes:
> On Mon, Apr 14, 2025 at 01:02:59PM +0200, Peter Zijlstra wrote:
>> This symbol is written as a C function with C calling convention, even
>> though it is only meant to be called from that clear_page() alternative.
>>
>> If we want to go change all this, then we should go do the same we do
>> for __clear_user() and write it thusly:
>>
>> asm volatile(ALTERNATIVE("rep stosb",
>> "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRS)
>> : "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
>> : "a" (0))
>>
>> And forget about all those clear_page_*() thingies.
>>
>
> I have to disagree.
>
> Next to nobody has FSRS, so for now one would have to expect everyone
> would be punting to the routine. Did you mean ERMS as sizes are in fact
> not short?
>
> rep_stos_alternative() as implemented right now sucks in its own right
> ("small" areas sorted out with an 8 byte and 1 byte loops, bigger ones
> unrolled 64 byte loop at a time, no rep stos{b,q} in sight). Someone(tm)
> should fix it and for the sake of argument suppose it happened. That's
> still some code executed to figure out how to zero and to align the buf.
>
> Instead, I think one can start with just retiring clear_page_orig().
>
> With that sucker out of the way, an optional quest is to figure out if
> rep stosq vs rep stosb makes any difference for pages -- for all I know
> rep stosq is the way. This would require testing on quite a few uarchs
> and I'm not going to blame anyone for not being interested.
IIRC some recent AMD models (Rome?) did expose REP_GOOD but not ERMS.
> Let's say nobody bothered OR rep stosb provides a win. In that case this
> can trivially ALTERNATIVE between rep stosb and rep stosq based on ERMS,
> no func calls necessary.
We shouldn't need any function calls for ERMS and REP_GOOD.
I think something like this untested code should work:
asm volatile(
ALTERNATIVE_2("call clear_pages_orig",
"rep stosb", X86_FEATURE_REP_GOOD,
"shrl $3,%ecx; rep stosq", X86_FEATURE_ERMS,
: "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
: "a" (0)))
--
ankur