Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

From: Rasmus Villemoes
Date: Tue Mar 10 2015 - 06:47:57 EST


On Thu, Mar 05 2015, Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Mar 05 2015, Tejun Heo <tj@xxxxxxxxxx> wrote:
>
>> I'd like to see how this actually affects larger operations - sth
>> along the line of top consumes D% less CPU cycles w/ N processes - if
>> for nothing else, just to get the sense of scale,
>
> That makes sense. I'll see if I can get some reproducible numbers, but
> I'm afraid the effect drowns in all the syscall overhead. Which would be
> a valid argument against touching the code.

I wasn't able to come up with a way to measure the absolute %cpu
reliably enough (neither from top's own output or using something like
watch -n1 ps -p $toppid -o %cpu) - it fluctuates too much to see any
difference. But using perf I was able to get somewhat stable numbers,
which suggest an improvement in the 0.5-1.0% range [1]. Measured with
10000 [2] sleeping processes in an idle virtual machine (and on mostly
idle host), patch on top of 3.19.0. Extracting the functions involved in
the decimal conversion I get

new1.txt: 2.35% top [kernel.kallsyms] [k] num_to_str
new2.txt: 2.70% top [kernel.kallsyms] [k] num_to_str
old1.txt: 2.25% top [kernel.kallsyms] [k] num_to_str
old2.txt: 2.18% top [kernel.kallsyms] [k] num_to_str

new1.txt: 0.63% top [kernel.kallsyms] [k] put_dec
new2.txt: 0.71% top [kernel.kallsyms] [k] put_dec
old1.txt: 0.67% top [kernel.kallsyms] [k] put_dec
old2.txt: 0.59% top [kernel.kallsyms] [k] put_dec

new1.txt: 0.53% top [kernel.kallsyms] [k] put_dec_full8
new2.txt: 0.55% top [kernel.kallsyms] [k] put_dec_full8
old1.txt: 1.09% top [kernel.kallsyms] [k] put_dec_full9
old2.txt: 1.15% top [kernel.kallsyms] [k] put_dec_full9

new1.txt: 1.12% top [kernel.kallsyms] [k] put_dec_trunc8
new2.txt: 1.22% top [kernel.kallsyms] [k] put_dec_trunc8
old1.txt: 1.64% top [kernel.kallsyms] [k] put_dec_trunc8
old2.txt: 1.65% top [kernel.kallsyms] [k] put_dec_trunc8

I can't explain why num_to_str apparently becomes slightly slower (the
patch essentially didn't touch it), but the put_dec_ helpers in any case
make up for that.

If someone has a suggestion for a better way of measuring this I'm all
ears.

Thanks,
Rasmus

[1] in terms of #cycles

[2] numbers for 2000 and 5000 processes are quite similar.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/