Re: [PATCH v2 8/9] atomic,x86: Alternative atomic_*_overflow() scheme

From: Peter Zijlstra
Date: Mon Dec 13 2021 - 11:43:47 EST


On Fri, Dec 10, 2021 at 05:16:26PM +0100, Peter Zijlstra wrote:
> Shift the overflow range from [0,INT_MIN] to [-1,INT_MIN], this allows
> optimizing atomic_inc_overflow() to use "jle" to detect increment
> from free-or-negative (with -1 being the new free and it's increment
> being 0 which sets ZF).
>
> This then obviously changes atomic_dec*_overflow() since it must now
> detect the 0->-1 transition rather than the 1->0. Luckily this is
> reflected in the carry flag (since we need to borrow to decrement 0).
> However this means decrement must now use the SUB instruction with a
> literal, since DEC doesn't set CF.
>
> This then gives the following primitives:
>
> [-1, INT_MIN] [0, INT_MIN]
>
> inc() inc()
> lock inc %[var] mov $-1, %[reg]
> jle error-free-or-negative lock xadd %[reg], %[var]
> test %[reg], %[reg]
> jle error-zero-or-negative
>
> dec() dec()
> lock sub $1, %[var] lock dec %[var]
> jc error-to-free jle error-zero-or-negative
> jl error-from-negative
>
> dec_and_test() dec_and_test()
> lock sub $1, %[var] lock dec %[var]
> jc do-free jl error-from-negative
> jl error-from-negative je do-free
>
> Make sure to set ATOMIC_OVERFLOW_OFFSET to 1 such that other code
> interacting with these primitives can re-center 0.

So Marco was expressing doubt about this exact interface for the
atomic_*_overflow() functions, since it's extremely easy to get the
whole ATOMIC_OVERFLOW_OFFSET thing wrong.

Since the current ops are strictly those that require inline asm, the
interface is fairly incomplete, which forces anybody who's going to use
these to provide whatever is missing. eg. atomic_inc_not_zero_overflow()
for example.

Another proposal had the user supply the offset as a compile time
constant to the function itself, raising a build-bug for any unsupported
offset. This would ensure the caller is at least aware of any non-zero
offset... still not going to really be dummy proof either.

Alternatively we could provide a more complete set of ops and/or a whole
new type, but... I'm not sure about that either.

I suppose I can try and do something like refcount_overflow_t and
implement the whole current refcount API in terms of that. Basically
everywhere we currently do refcount_warn_saturate() would become goto
label.

And then refcount_t could be a thin wrapper on top of that. But urgh...
lots of work, very little gain.

So what do we do? Keep things as is, and think about it again once we
got the first bug in hand, preemptively add a few ops or go completely
overboard?

Obviously I'm all for keeping things as is (less work for this lazy
bastard etc..)