Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset

From: Vegard Nossum
Date: Thu Jul 03 2025 - 09:35:51 EST



On 03/07/2025 14:15, David Laight wrote:
On Thu, 3 Jul 2025 13:39:57 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
On Tue, 1 Jul 2025 12:58:31 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index a508e4a8c66a..47b613690f84 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
EXPORT_SYMBOL_GPL(clear_page_erms)
/*
- * Default clear user-space.
+ * Default memset.
* Input:
* rdi destination
+ * rsi scratch
* rcx count
- * rax is zero
+ * al is value
*
* Output:
* rcx: uncleared bytes or 0 if successful.
+ * rdx: clobbered
*/
SYM_FUNC_START(rep_stos_alternative)
ANNOTATE_NOENDBR
+
+ movzbq %al, %rsi
+ movabs $0x0101010101010101, %rax
+
+ /* RDX:RAX = RAX * RSI */
+ mulq %rsi

NAK - you can't do that here.
Neither %rsi nor %rdx can be trashed.
The function has a very explicit calling convention.

That's why we have the clobbers... see below

What calling convention? We change the only caller to confirm to this.

The one that is implicit in:

+ asm volatile("1:\n\t"
+ ALT_64("rep stosb",
+ "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
+ "2:\n\t"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
+ : "a" ((uint8_t)v)

The called function is only allowed to change the registers that
'rep stosb' uses - except it can access (but not change)
all of %rax - not just %al.

See: https://godbolt.org/z/3fnrT3x9r
In particular note that 'do_mset' must not change %rax.

This is very specific and is done so that the compiler can use
all the registers.

I feel like you trimmed off the clobbers from the asm() in the context
above. For reference, it is:

+ : "memory", _ASM_SI, _ASM_DX);

I'm not saying this can't be optimized, but that doesn't seem to be your
complaint -- you say "the called function is only allowed to change
...", but this is not true when we have the clobbers, right?

This is exactly what I fixed with my v7 fixlet to this patch:

https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@xxxxxxxxxx/


Vegard