Re: [RFC] div64_64 support

From: Sami Farin
Date: Tue Mar 06 2007 - 16:54:12 EST


On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.
>
> Slightly gross test program (with original cubic wraparound bug fixed).
...
> {~0, 2097151},
^^^^^^^
this should be 2642245.

Without serializing instruction before rdtsc and with one loop
I do not get very accurate results (104 for ncubic, > 1000 for others).

#define rdtscll_serialize(val) \
__asm__ __volatile__("movl $0, %%eax\n\tcpuid\n\trdtsc\n" : "=A" (val) : : "ebx", "ecx")

Here Pentium D timings for 1000 loops.

~0, 2097151

Function clocks mean(us) max(us) std(us) total error
ocubic 912 0.306 20.317 0.730 545101
ncubic 777 0.261 14.799 0.486 576263
acbrt 1168 0.392 21.681 0.547 547562
hcbrt 827 0.278 15.244 0.387 2410

~0, 2642245

Function clocks mean(us) max(us) std(us) total error
ocubic 908 0.305 20.210 0.656 7
ncubic 775 0.260 14.792 0.550 31169
acbrt 1176 0.395 22.017 0.970 2468
hcbrt 826 0.278 15.326 0.670 547504

And I found bug in gcc-4.1.2, it gave 0 for ncubic results
when doing 1000 loops test... gcc-4.0.3 works.

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/