Re: [PATCH 2/2] string: improve strlen performance

From: Andy Shevchenko
Date: Thu May 02 2024 - 11:00:50 EST


On Thu, May 2, 2024 at 5:14 PM Hsin-Yu.Chen <harry021633@xxxxxxxxx> wrote:
>
> Port `strlen` in gcc,

This is the Linux kernel project. What does this mean?
Also note we refer to the function as strlen() (and no [back]quotes).

> which enhance performance over 10 times

>
> Please refer to these following articles
> 1. [Determine if a word has a byte less than n]
> (https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord)
> 2. [Determine if a word has a zero byte]
> (https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord)

Make these proper Link: tags

Link: URL#1 [1]
Link: URL#2 [2]

..

> - const char *sc;
> + const char *char_ptr;

No need to change the var name for this.

> + const unsigned long *longword_ptr;
> + unsigned long longword, himagic, lomagic;

Keep it in reversed xmas tree order.

..

> + /* Handle the first few characters by reading one character at a time.
> + * Do this until CHAR_PTR is aligned on a longword boundary.
> + */

/*
* This is the wrong comment style. You may
* use this example.
*/

..

> + for (char_ptr = s; ((unsigned long) char_ptr
> + & (sizeof(longword) - 1)) != 0;
> + ++char_ptr)

This is too verbose (too many unneeded symbols) and why pre-increment?
What is special about it?

..

> + /* All these elucidatory comments refer to 4-byte longwords,
> + * but the theory applies equally well to 8-byte longwords.
> + */

Use proper style.

> + longword_ptr = (unsigned long *) char_ptr;

No space after casting and why do you need it?

..

> + /* Bits 31, 24, 16, and 8 of this number are zero.
> + * Call these bits the "holes."
> + * Note that there is a hole just to the left of
> + * each byte, with an extra at the end:
> + * bits: 01111110 11111110 11111110 11111111
> + * bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
> + * The 1-bits make sure that carries propagate to the next 0-bit.
> + * The 0-bits provide holes for carries to fall into.
> + */

Use proper style.

..

> + /* 64-bit version of the magic. */
> + /* Do the shift in two steps to avoid a warning if long has 32 bits.
> + */

Ditto.

..

> + if (sizeof(longword) > 8)
> + abort();

Huh?!

..

> + /* Instead of the traditional loop which tests each character,
> + * we will test a longword at a time. The tricky part is testing
> + * if *any of the four* bytes in the longword in question are zero.
> + */

Proper style, please.

..

> + for (;;) {
> + longword = *longword_ptr++;
> + if (((longword - lomagic) & ~longword & himagic) != 0) {
> +
> + /* Which of the bytes was the zero?
> + * If none of them were, it was a misfire; continue the search.
> + */
> + const char *cp = (const char *) (longword_ptr - 1);

> + if (cp[0] == 0)
> + return cp - s;
> + else if (cp[1] == 0)
> + return cp - s + 1;
> + else if (cp[2] == 0)
> + return cp - s + 2;
> + else if (cp[3] == 0)
> + return cp - s + 3;

> + if (sizeof(longword) > 4) {

if (... <= 4)
continue;

> + if (cp[4] == 0)
> + return cp - s + 4;
> + else if (cp[5] == 0)
> + return cp - s + 5;
> + else if (cp[6] == 0)
> + return cp - s + 6;
> + else if (cp[7] == 0)
> + return cp - s + 7;

A lot of redundant 'else':s.

> + }
> + }
> + }

--
With Best Regards,
Andy Shevchenko