Re: Adding __popcountsi2 and __popcountdi2

From: Nathan Chancellor
Date: Thu Apr 24 2025 - 22:11:49 EST


On Thu, Apr 24, 2025 at 06:36:33PM -0700, Linus Torvalds wrote:
> On Thu, 24 Apr 2025 at 17:33, Nathan Chancellor <nathan@xxxxxxxxxx> wrote:
> >
> > I figured added
> > these may not be as bad as the wcslen() case because most architectures
> > generally have an optimized popcount implementation and I am not sure
> > compiler builtins are banned entirely from the kernel but I can
> > understand if it is still contentious.
>
> Why does the compiler even bother to do this if the architecture
> doesn't have the popcount instruction? The function call is quite
> possibly more expensive than just doing it the stupid way.

Not entirely sure what the motivation is from the compiler side but I
cannot immagine that they would be doing this if it was not more
efficient in some way.

> But if you want to do this, put the damn thing as an alias on the code
> that actually *does* the SW fallback in lib/hweight.c.
>
> Because the way your patch does it now, it takes "I'm doing stupid
> things" to the next level by turning that function call into *two*
> function calls - first calling __popcountsi2, which then calls
> __sw_hweight32.
>
> Let's not do stupid things, ok?

I will test declaring __popcount{s,d}i2() as aliases of
__sw_hweight{32,64}() to see what effect that has but I figured that
calling the __arch_hweight variants was more correct because some
architectures (at least RISC-V and x86 when I looked) use alternatives
in that path to use hardware instructions and avoid the software path
altogether. While there would still be the overhead from the function
call, I figured not using the software fallback would at least soften
that blow.

Cheers,
Nathan