Re: [PATCH 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full()

From: Denys Vlasenko
Date: Thu Aug 05 2010 - 23:59:11 EST

Next message: Greg Thelen: "Re: [PATCH 1/4 -mm][memcg] quick ID lookup in memcg"
Previous message: David John: "Re: 2.6.35-rc6+: i915: Bisected regression"
In reply to: MichaÅ Nazarewicz: "Re: [PATCH 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines"
Next in thread: Michal Nazarewicz: "Re: [PATCH 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Friday 06 August 2010 00:38, Michal Nazarewicz wrote:
> The put_dec_trunc() and put_dec_full() functions were based on
> a code optimised for processors with 8-bit ALU but even then
> they failed to satisfy the same constraints

"Failed"? Interesting wording. Yes, the code won't map easily
onto 8-bit ALU, for the simple reason Linux kernel
does not support any 8-bit CPUs, and by going to wider register
I was able to process 5 decimal digits at once, not 4.
It was done deliberately. It is not a "failure".

Your code isn't 8-bit ALU optimized either.

Do you think a bit of smear of previous code
would help your to be accepted?

> and in fact
> required at least 16-bit ALU (because at least one number they
> operate in can take 9 bits).

Yes, as explained above.

> This version of those functions proposed by this patch goes
> further and uses the full capacity of a 32-bit ALU and instead
> of splitting the number into nibbles and operating on them it
> performs the obvious algorithm for base conversion expect it
> uses optimised code for dividing by ten (ie. no division is
> actually performed).

(1) "expect" is a typo
(2) No, _this_ patch does not eliminate division. Next one does.
Move this part of changelong to the next patch, where it belongs.

> + * Decimal conversion is by far the most typical, and is used for
> + * /proc and /sys data. This directly impacts e.g. top performance
> + * with many processes running.
> + *
> + * We optimize it for speed using ideas described at
> + * <http://www.cs.uiowa.edu/~jones/bcd/divide.html>.

Do you have author's permission to do it?
Document it in the comment please.

> + * '(num * 0xcccd) >> 19' is an approximation of 'num / 10' that gives
> + * correct results for num < 81920. Because of this, we check at the
> + * beginning if we are dealing with a number that may cause trouble
> + * and if so, we make it smaller.

This comment needs to be moved to the code line where the opration
is performed.

> + * (As a minor note, all operands are always 16 bit so this function
> + * should work well on hardware that cannot multiply 32 bit numbers).
> + *
> + * (Previous a code based on

English is a bit broken in the line above.

--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg Thelen: "Re: [PATCH 1/4 -mm][memcg] quick ID lookup in memcg"
Previous message: David John: "Re: 2.6.35-rc6+: i915: Bisected regression"
In reply to: MichaÅ Nazarewicz: "Re: [PATCH 2/3] lib: vsprintf: optimised put_dec() for 32-bit machines"
Next in thread: Michal Nazarewicz: "Re: [PATCH 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]