Re: [PATCH] x86/tsc: improve arithmetic division

From: hpa
Date: Sat Feb 01 2020 - 17:31:42 EST


On January 30, 2020 5:08:38 AM PST, Wen Yang <wenyang@xxxxxxxxxxxxxxxxx> wrote:
>do_div() does a 64-by-32 division. Use div64_ul64() or div64_ul()
>instead of it if the divisor is 'ul64' or 'unsigned long', to avoid
>truncation to lower 32-bit.
>And as a nice side effect also cleans up the function a bit.
>
>Signed-off-by: Wen Yang <wenyang@xxxxxxxxxxxxxxxxx>
>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>Cc: Borislav Petkov <bp@xxxxxxxxx>
>Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>Cc: x86@xxxxxxxxxx
>Cc: linux-kernel@xxxxxxxxxxxxxxx
>---
> arch/x86/kernel/tsc.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>index 7e322e2daaf5..4c0320e68699 100644
>--- a/arch/x86/kernel/tsc.c
>+++ b/arch/x86/kernel/tsc.c
>@@ -357,9 +357,7 @@ static unsigned long calc_pmtimer_ref(u64 deltatsc,
>u64 pm1, u64 pm2)
> pm2 -= pm1;
> tmp = pm2 * 1000000000LL;
> do_div(tmp, PMTMR_TICKS_PER_SEC);
>- do_div(deltatsc, tmp);
>-
>- return (unsigned long) deltatsc;
>+ return (unsigned long) div64_u64(deltatsc, tmp);
> }
>
> #define CAL_MS 10
>@@ -778,8 +776,7 @@ static unsigned long
>pit_hpet_ptimer_calibrate_cpu(void)
> tsc_ref_min = min(tsc_ref_min, (unsigned long) tsc2);
>
> /* Check the reference deviation */
>- delta = ((u64) tsc_pit_min) * 100;
>- do_div(delta, tsc_ref_min);
>+ delta = div64_ul(((u64) tsc_pit_min) * 100, tsc_ref_min);
>
> /*
> * If both calibration results are inside a 10% window

This is a *lot* more expensive on 32 bits (something like 10x) and as the output is truncated to unsigned long anyway, it is also unnecessary.

We don't use the remainder, so using do_div() is not merely unnecessary but almost certainly generates worse code: we are multiplying and then dividing by a constant, and most of the time gcc can optimize that into a single multiply/shift operation; otherwise we can do that optimization for it (see timeconst.bc.)

The one thing that gcc can't necessary do automatically is to know when a 64/32 â 32 division is safe; C semantics are truncation, but the CPU will trap. If it can turn it into a multiply then that problem obviously goes away.

So first I would test with regular / operators and see what code comes out.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.