Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

From: George Spelvin
Date: Mon Sep 24 2012 - 08:16:07 EST


> You are using a 64-bit multiply in a path that is designed for 32-bit
> processors, which makes me feel that it will be slower.

Slower than the divide it's replacing?

The following 32-bit processors have 32x32->64-bit multiply:

x86
ARM (as of ARMv4 = ARM7TDMI, the lowest version in common use)
SPARCv7, SPARCv8
MIPS32
MC68020
PA-RISC 1.1 (XMPYU)
avr32
PowerPC (MULHWU)
VAX (EMUL)

I could keep going through the full list of architectures in arch/,
but it's starting to get slow and I haven't hit one *without* a widening
multiply yet. (And if it doesn't have hardware divide, I expect the
multiply is still faster.)

Ah! Found one! ColdFire MCF5272 has 32/32-bit divide, but only 32x32->32
multiply. However, DIVU takes 20 or 35 cycles, which is pretty close to the
time to synthesize the multiply out of 4 16x16->32 pieces (4 cycles each).

I could do some Kconfig hacking and make the code path architecture-dependent.
Do you think it's worth it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/