Re: [BUG?] clang miscompilation of inline ASM with overlapping input/output registers
From: Bill Wendling
Date: Mon Jun 02 2025 - 17:14:54 EST
On Mon, Jun 2, 2025 at 12:37 PM Nathan Chancellor <nathan@xxxxxxxxxx> wrote:
>
> Hi Thomas,
>
> On Mon, Jun 02, 2025 at 10:29:30AM +0200, Thomas Weißschuh wrote:
> > I observed a surprising behavior of clang around inline assembly and register
> > variables, differing from GCC.
> >
> > Consider the following snippet:
> >
> > $ cat repro.c
> > int main(void)
> > {
> > register long in asm("eax");
> > register long out asm("eax");
> >
> > in = 0;
> > asm volatile("nop" : "+r" (out) : "r" (in));
> >
> > return out;
> > }
> >
> > The relevant part is that the inline ASM has input and output register
> > variables both using the same register and the input one is assigned to.
> >
> >
> > Compile with clang (19.1.7, tested on godbolt.org with trunk):
> >
> > $ clang -O2 repro.c
> > $ llvm-objdump --disassemble-symbols=main a.out
> > 0000000000001120 <main>:
> > 1120: 90 nop
> > 1121: c3 retq
> >
> > The store of the variable "in" has been optimized away.
> >
> >
> > Compile with gcc (15.1.1, also tested on godbolt.org with trunk):
> >
> > $ gcc -O2 repro.c
> > $ llvm-objdump --disassemble-symbols=main a.out
> > 0000000000001020 <main>:
> > 1020: 31 c0 xorl %eax, %eax
> > 1022: 90 nop
> > 1023: c3 retq
> > 1024: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
> > 102e: 66 90 nop
> >
> > The store to "eax" is preserved.
> >
> >
> > As far as I can see gcc is correct here. As the variable is used as an input to
> > ASM the compiler can not optimize away.
> > On other architectures the same effect can be observed.
> >
> >
> > The real kernel example for this issue is in the loongarch vDSO code from
> > arch/loongarch/include/asm/vdso/gettimeofday.h:
> >
> > static __always_inline long clock_gettime_fallback(
> > clockid_t _clkid,
> > struct __kernel_timespec *_ts)
> > {
> > register clockid_t clkid asm("a0") = _clkid;
> > register struct __kernel_timespec *ts asm("a1") = _ts;
> > register long nr asm("a7") = __NR_clock_gettime;
> > register long ret asm("a0");
> >
> > asm volatile(
> > " syscall 0\n"
> > : "+r" (ret)
> > : "r" (nr), "r" (clkid), "r" (ts)
> > : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7",
> > "$t8", "memory");
> >
> > return ret;
> > }
> >
> > Here both "clkid" and "ret" are stored in "a0". I can't point to the concrete
> > disassembly here because it is inlined into a much larger block of code
> > and removing the inlining hides the bug.
Hi Thomas,
To help find a random inline assembly in your code, place comments
within the ASM block. Something like:
asm volatile(
"# HEY! I'M RIGHT HERE\n\t"
"syscall 0\n"
...
You can then search the assembly for that to see what's generated.
> > Also in my tests the bug only manifests for "_clkid" in the interval [16, 23].
> > Other values work by chance.
> > Removing the aliasing by dropping "ret" and using "clkid" for both input and
> > output produces correct results.
> >
> > Is this a clang bug, is the code broken or am I missing something?
>
> For the record, inline assembly semantics are a little out of my
> wheelhouse. Bill can probably comment more on what might be happening
> internally within clang/LLVM here but it does seem like there could be a
> clang code generation bug. Looking at the example you provided and GCC's
> assembly and local register documentation, which has a very similar
> example, it looks like the issue disappears when using "=r" for the
> output constaint instead of "+r".
>
> https://godbolt.org/z/jo3T8o3hj
>
> Looking at the constraint string in both the unoptimized and optimized
> IR, it looks like eax appears an input twice in the list for broken(),
> likely because "+r" was internally expanded to "=r" for the output and
> "r" for the input. In the optimized IR, we can see that the first eax
> will be the 2 that was assigned but the second eax is "undef"
> (undefined), which follows from the unoptimized IR. What I am guessing
> happens based on my investigation with '-mllvm -opt-bisect-limit=' on
> x86 is the second eax "wins" over the first one that has the actual
> value. Using an undef value is UB so the backend removes the initial
> write to eax altogether.
>
> It definitely seems like this could be handled better on the clang side
> but I do think that switching the constraints to "=r" would be a proper
> fix, as "+r" is really an overspecification and that matches an almost
> identical example in the GCC local register documentation:
>
> https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/Local-Register-Variables.html
>
[+Ian because he also knows inline assembly]
This might be a Clang bug, as it's well known that Clang's support of
GCC's extended asm is lacking in key areas...especially with regards
to local register variables.
I'm not confident I completely understand the documentation Nathan
pointed out. It states that the only supported use is for input and
output to extended asm, but then goes on to show an example where they
initialize a variable. (??)
Looking at this a bit closer, the LLVM IR initially generated by the
front end is this for the "+r" version (it's verbose, but not to
worry):
1. %in = alloca i64, align 8
2. %out = alloca i64, align 8
3. store i32 0, ptr %retval, align 4
4. store i64 0, ptr %in, align 8, !dbg !19
5. %0 = load i64, ptr %out, align 8, !dbg !26
6. %1 = load i64, ptr %in, align 8, !dbg !27
7. %2 = call i64 asm sideeffect "nop",
"={eax},{eax},{eax},~{dirflag},~{fpsr},~{flags}"(i64 %1, i64 %0) #2,
!dbg !26
8. store i64 %2, ptr %out, align 8, !dbg !26
Notice instructions (1), (2), (5), and (6). Instructions (1) and (2)
are simply a way for LLVM to indicate that these are variables and are
64-bits in size. The "%in" variable is assigned the value "0" (zero)
with the "store" in (4), but notice that "%out" never has a value
assigned to it. The assembly block indicated that it *should* have a
value, because of the '+' modifier. This means that once all of the
stores and loads are reduced by later passes the "%out" variable will
have an undefined value.
One could argue that it *does* have a value, because it's in the same
register as "%in", but LLVM's middle end doesn't work like that. It
only sees two variables, not that they're in the same register. So
that's what adds the 'undef' that Nathan pointed out.. In Clang terms,
"undef" can mean "undefined behavior" (like here) and may elide code
that exhibits it.
One way you could resolve this is to say that the "out" variable has
an early clobber:
: "+&r"(out)
This tells the compiler that the value is written before being read.
Therefore, the compiler won't assume that it has no value.
-bw