Re: [BUG?] clang miscompilation of inline ASM with overlapping input/output registers

From: Nathan Chancellor
Date: Mon Jun 02 2025 - 15:37:31 EST


Hi Thomas,

On Mon, Jun 02, 2025 at 10:29:30AM +0200, Thomas Weißschuh wrote:
> I observed a surprising behavior of clang around inline assembly and register
> variables, differing from GCC.
>
> Consider the following snippet:
>
> $ cat repro.c
> int main(void)
> {
> register long in asm("eax");
> register long out asm("eax");
>
> in = 0;
> asm volatile("nop" : "+r" (out) : "r" (in));
>
> return out;
> }
>
> The relevant part is that the inline ASM has input and output register
> variables both using the same register and the input one is assigned to.
>
>
> Compile with clang (19.1.7, tested on godbolt.org with trunk):
>
> $ clang -O2 repro.c
> $ llvm-objdump --disassemble-symbols=main a.out
> 0000000000001120 <main>:
> 1120: 90 nop
> 1121: c3 retq
>
> The store of the variable "in" has been optimized away.
>
>
> Compile with gcc (15.1.1, also tested on godbolt.org with trunk):
>
> $ gcc -O2 repro.c
> $ llvm-objdump --disassemble-symbols=main a.out
> 0000000000001020 <main>:
> 1020: 31 c0 xorl %eax, %eax
> 1022: 90 nop
> 1023: c3 retq
> 1024: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
> 102e: 66 90 nop
>
> The store to "eax" is preserved.
>
>
> As far as I can see gcc is correct here. As the variable is used as an input to
> ASM the compiler can not optimize away.
> On other architectures the same effect can be observed.
>
>
> The real kernel example for this issue is in the loongarch vDSO code from
> arch/loongarch/include/asm/vdso/gettimeofday.h:
>
> static __always_inline long clock_gettime_fallback(
> clockid_t _clkid,
> struct __kernel_timespec *_ts)
> {
> register clockid_t clkid asm("a0") = _clkid;
> register struct __kernel_timespec *ts asm("a1") = _ts;
> register long nr asm("a7") = __NR_clock_gettime;
> register long ret asm("a0");
>
> asm volatile(
> " syscall 0\n"
> : "+r" (ret)
> : "r" (nr), "r" (clkid), "r" (ts)
> : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7",
> "$t8", "memory");
>
> return ret;
> }
>
> Here both "clkid" and "ret" are stored in "a0". I can't point to the concrete
> disassembly here because it is inlined into a much larger block of code
> and removing the inlining hides the bug.
> Also in my tests the bug only manifests for "_clkid" in the interval [16, 23].
> Other values work by chance.
> Removing the aliasing by dropping "ret" and using "clkid" for both input and
> output produces correct results.
>
> Is this a clang bug, is the code broken or am I missing something?

For the record, inline assembly semantics are a little out of my
wheelhouse. Bill can probably comment more on what might be happening
internally within clang/LLVM here but it does seem like there could be a
clang code generation bug. Looking at the example you provided and GCC's
assembly and local register documentation, which has a very similar
example, it looks like the issue disappears when using "=r" for the
output constaint instead of "+r".

https://godbolt.org/z/jo3T8o3hj

Looking at the constraint string in both the unoptimized and optimized
IR, it looks like eax appears an input twice in the list for broken(),
likely because "+r" was internally expanded to "=r" for the output and
"r" for the input. In the optimized IR, we can see that the first eax
will be the 2 that was assigned but the second eax is "undef"
(undefined), which follows from the unoptimized IR. What I am guessing
happens based on my investigation with '-mllvm -opt-bisect-limit=' on
x86 is the second eax "wins" over the first one that has the actual
value. Using an undef value is UB so the backend removes the initial
write to eax altogether.

It definitely seems like this could be handled better on the clang side
but I do think that switching the constraints to "=r" would be a proper
fix, as "+r" is really an overspecification and that matches an almost
identical example in the GCC local register documentation:

https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/Local-Register-Variables.html

Cheers,
Nathan