Re: [PATCH 01/16] math128: Introduce various 128bit primitives

From: Linus Torvalds
Date: Fri Oct 26 2012 - 11:17:33 EST

Next message: Dave Hansen: "Re: [PATCH] page_alloc: fix the incorrect adjustment to zone->present_pages"
Previous message: Dave Jones: "Re: lots of suspicious RCU traces"
In reply to: Harald Gustafsson: "Re: [PATCH 01/16] math128: Introduce various 128bit primitives"
Next in thread: Geert Uytterhoeven: "Re: [PATCH 01/16] math128: Introduce various 128bit primitives"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 26, 2012 at 1:49 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> No, it does a compare on two u128

Actually, it apparently compares two multiplications.

That might be optimizable in itself.

> The point is (as mentioned in the comments below) overflowing an actual
> u64 is rare, however since some of this (specifically the
> dl_{runtime,deadline} parameters) is user specified, we have to assume
> we will overflow.

Any chance we could just limit them?

> + u128 left, right;
> +
> + /*
> + * left and right are the two sides of the equation above,
> + * after a bit of shuffling to use multiplications instead
> + * of divisions.
> + *
> + * Note that none of the time values involved in the two
> + * multiplications are absolute: dl_deadline and dl_runtime
> + * are the relative deadline and the maximum runtime of each
> + * instance, runtime is the runtime left for the last instance
> + * and (deadline - t), since t is rq->clock, is the time left
> + * to the (absolute) deadline. Therefore, overflowing the u64
> + * type is very unlikely to occur in both cases.
> + */
> + left = mul_u64_u64(dl_se->dl_deadline, dl_se->runtime);
> + right = mul_u64_u64((dl_se->deadline - t), dl_se->dl_runtime);
> +
> + if (cmp_u128(left, right) > 0)
> + return true;
> +
> + return false;

So how often could we do this without doing the multiplication at all?

It's trivial to see that 'right > left' if the individual
multiplicands are both bigger, for example. Maybe that is common?

And even if it overflows in 64-bit does it overflow in 92? For 32-bit
machines, the difference there is quite noticeable.

So the above might actually be better written as a
"compare_64bit_multiply(a,b,c,d)". At the same time, are we
*seriously* ever talking about multi-second runtimes or deadlines?
Because even in nanoseconds, I assume that the common case *by*far* in
scheduling would be about values smaller than four seconds, in which
case all of the above values are 32-bit, making the compares *much*
cheaper.

So on a 32-bit machine (say, x86-32), you might just have:

- or all the high words together, jump to slow case if the result is non-zero
- otherwise, do just two 32x32 multiplies and check which of the two is bigger.

That's a *huge* reduction in expensive multiplications.

And *THAT* is why generic 128-bit math is stupid. Don't do it.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Hansen: "Re: [PATCH] page_alloc: fix the incorrect adjustment to zone->present_pages"
Previous message: Dave Jones: "Re: lots of suspicious RCU traces"
In reply to: Harald Gustafsson: "Re: [PATCH 01/16] math128: Introduce various 128bit primitives"
Next in thread: Geert Uytterhoeven: "Re: [PATCH 01/16] math128: Introduce various 128bit primitives"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]