Re: [tip: locking/core] locking/atomic/x86: Introduce arch_try_cmpxchg64_local()

From: Uros Bizjak
Date: Wed Apr 17 2024 - 15:21:50 EST


On Wed, Apr 17, 2024 at 8:41 PM Borislav Petkov <bp@xxxxxxxxx> wrote:

> > If the line count is the problem, I can easily parametrize new and
> > existing big macro descriptions in a follow-up patch. However, I was
> > advised to not mix everything together in one patch, but rest assured,
> > the creation and testing of the follow-up patch would take me less
> > time than writing the message you are reading.
>
> I'm simply making sure we're not going off the rails with
> micro-optimizing for no apparent reason.
>
> Saving a
>
> test %rax,%rax
>
> doesn't need fixing in my book. Because I don't think you'll be able to
> even measure it.

The above is perhaps a little unfortunate example taken from

if (cmpxchg64(...))

where the check is against zero. The compiler can optimize the check
to a TEST insn in this particular case, but otherwise CMP will be
emitted for different usages. Not a big difference, but a register has
to be kept live across cmpxchg8b.

> > It brings no future maintenance burden, but it perhaps improves
> > someone's life a tiny bit.
>
> This is where you and I disagree: touching that alternative in
> __arch_try_cmpxchg64_emu_local() does as we tend to change them from
> time to time, especially in recent times.
>
> And I wouldn't mind touching it but if it is there to save 10 insns on
> 32-bit - which doesn't matter - then why bother?
>
> Or do you have a relevant 32-bit workload which brings any improvement
> by this change?

There is one important issue. When a register (or two for double-word
values) has to be kept live for a compare, the register pressure on
32bit targets around cmpxchg8b goes through the roof, and when using
the frame pointer (and maybe some fixed register, e.g. PIC), the
register allocator runs out of available registers. The number of
spills around cmpxchg8b signals the troubles register allocator goes
through to "fix" everything, so from the compiler PoV any relief is
more than welcome here. Even in GCC internal libraries, we have had to
take a special approach with this insn to avoid internal compiler
errors. The kernel was quite lucky here ;)

Thanks and best regards,
Uros.