Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Uros Bizjak
Date: Fri Oct 13 2023 - 07:53:48 EST


On Fri, Oct 13, 2023 at 11:38 AM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> On Thu, Oct 12, 2023 at 8:01 PM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > On Thu, Oct 12, 2023 at 7:47 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, 12 Oct 2023 at 10:10, Linus Torvalds
> > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > The fix seems to be a simple one-liner, ie just
> > > >
> > > > - asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
> > > > + asm(__pcpu_op2_##size(op, __percpu_arg(a[var]), "%[val]") \
> > >
> > > Nope. That doesn't work at all.
> > >
> > > It turns out that we're not the only ones that didn't know about the
> > > 'a' modifier.
> > >
> > > clang has also never heard of it in this context, and the above
> > > one-liner results in an endless sea of errors, with
> > >
> > > error: invalid operand in inline asm: 'movq %gs:${1:a}, $0'
> > >
> > > Looking around, I think it's X86AsmPrinter::PrintAsmOperand() that is
> > > supposed to handle these things, and while it does have some handling
> > > for 'a', the comment around it says
> > >
> > > case 'a': // This is an address. Currently only 'i' and 'r' are expected.
> > >
> > > and I think our use ends up just confusing the heck out of clang. Of
> > > course, clang also does this:
> > >
> > > case 'P': // This is the operand of a call, treat specially.
> > > PrintPCRelImm(MI, OpNo, O);
> > > return false;
> > >
> > > so clang *already* generates those 'current' accesses as PCrelative, and I see
> > >
> > > movq %gs:pcpu_hot(%rip), %r13
> > >
> > > in the generated code.
> > >
> > > End result: clang actually generates what we want just using 'P', and
> > > the whole "P vs a" is only a gcc thing.

Maybe we should go with what Clang expects. %a with "i" constraint is
also what GCC handles, because

‘i’: An immediate integer operand (one with constant value) is
allowed. This includes symbolic constants whose values will be known
only at assembly time or later.

Attached patch patches both cases: the generated code for
mem_encrypt_identity.c does not change while the change in
percpu.h brings expected 4kB code size reduction. I think this is the
correct solution that will work for both compilers.

Uros.
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 60ea7755c0fe..12f37b951fa7 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -175,9 +175,9 @@ do { \
#define percpu_stable_op(size, op, _var) \
({ \
__pcpu_type_##size pfo_val__; \
- asm(__pcpu_op2_##size(op, __force_percpu_arg(P[var]), "%[val]") \
+ asm(__pcpu_op2_##size(op, __force_percpu_arg(a[var]), "%[val]") \
: [val] __pcpu_reg_##size("=", pfo_val__) \
- : [var] "p" (&(_var))); \
+ : [var] "i" (&(_var))); \
(typeof(_var))(unsigned long) pfo_val__; \
})

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index d73aeb16417f..2b1e0e781f9d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -346,9 +346,9 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* We're running identity mapped, so we must obtain the address to the
* SME encryption workarea using rip-relative addressing.
*/
- asm ("lea sme_workarea(%%rip), %0"
+ asm ("lea %a1, %0"
: "=r" (workarea_start)
- : "p" (sme_workarea));
+ : "i" (sme_workarea));

/*
* Calculate required number of workarea bytes needed:
@@ -582,15 +582,15 @@ void __init sme_enable(struct boot_params *bp)
* identity mapped, so we must obtain the address to the SME command
* line argument data using rip-relative addressing.
*/
- asm ("lea sme_cmdline_arg(%%rip), %0"
+ asm ("lea %a1, %0"
: "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
+ : "i" (sme_cmdline_arg));
+ asm ("lea %a1, %0"
: "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
- asm ("lea sme_cmdline_off(%%rip), %0"
+ : "i" (sme_cmdline_on));
+ asm ("lea %a1, %0"
: "=r" (cmdline_off)
- : "p" (sme_cmdline_off));
+ : "i" (sme_cmdline_off));

if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
active_by_default = true;