Re: [PATCH] x86/math64: handle #DE in mul_u64_u64_div_u64()

From: David Laight
Date: Wed Jul 23 2025 - 17:48:54 EST


On Wed, 23 Jul 2025 11:38:25 +0200
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> On 07/22, David Laight wrote:
> >
> > On Tue, 22 Jul 2025 15:21:48 +0200
> > Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > > {
> > > char ok = 0;
> > > u64 q;
> > >
> > > asm ("mulq %3; 1: divq %4; movb $1,%1; 2:\n"
> > > _ASM_EXTABLE(1b, 2b)
> > > : "=a" (q), "+q" (ok)
> > > : "a" (a), "rm" (mul), "rm" (div)
> > > : "rdx");
> > >
> > > if (ok)
> > > return q;
> > > BUG_ON(!div);
> > > WARN_ONCE(1, "muldiv overflow.\n");
> >
> > I wonder what WARN_ON_ONCE("muldiv overflow") outputs?
>
> Well, it outputs "muldiv overflow." ;) So I am not sure it is better
> than just WARN_ON_ONCE(1).
>
> > Actually, without the BUG or WARN you want:
> > u64 fail = ~(u64)0;
> > then
> > incq $1 ... "+r" (fail)
> > and finally
> > return q | fail;
> > to remove the conditional branches from the normal path
> > (apart from one the caller might do)
>
> I was thinking about
>
> static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> {
> u64 q;
>
> asm ("mulq %2; 1: divq %3; jmp 3f; 2: movq $-1,%0; 3:\n"
> _ASM_EXTABLE(1b, 2b)
> : "=a" (q)
> : "a" (a), "rm" (mul), "rm" (div)
> : "rdx");
>
> return q;
> }
>
> to remove the conditional branch and additional variable. Your version
> is probably beterr... But this is without WARN/BUG.

I wish there was a way of doing a WARN_ONCE from asm with a single instruction.
Then you could put one after your 2:
Otherwise is it a conditional and a load of inlined code.

> So, which version do you prefer?

I wish I knew :-)

Yours is a few bytes shorter, uses one less register, but has that unconditional jmp.
I suspect we don't worry about the cpu not predicting a jump - especially with
the divq.
It's not as though some real-time code relies on this code being as fast
as absolutely possible.
Not using a register is probably the main win.

So maybe I lose (this time).

Further work could add an 'int *' parameter that is set non-zero (from %rax)
if the divide traps; optimised out if NULL.
The easy way is two copies of the asm statement.
But I've already got two copies in the version that does (a * b + c)/d
and four copies is getting silly.

Actually this seems ok - at least as a real function:

u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 div)
{
unsigned __int128 v = (__int128)a * b + c;

asm ("1: divq %1; jmp 3f; 2: movq $-1,%%rax; 3:\n"
_ASM_EXTABLE(1b, 2b)
: "+A" (v)
: "r" (div));
return v;
}

But (as I found with 32bit) gcc can decide to do a 128x128 multiply.
It does do a full 128bit add - with an extra register for the zero.

Not that you should never pass "rm" to clang, needs to be "r".
There is a #define for it.

David


>
> Oleg.
>