Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)

From: Linus Torvalds
Date: Thu Jul 28 2005 - 12:43:41 EST

On Thu, 28 Jul 2005, Steven Rostedt wrote:
> I can change the find_first_bit to use __builtin_ffs, but how would you
> implement the ffz?

The thing is, there are basically _zero_ upsides to using the __builtin_xx
functions on x86.

There may be more upsides on other architectures (*cough*ia64*cough*) that
have strange scheduling issues and other complexities, but on x86 in
particular, the __builtin_xxx() functions tend to be a lot more pain than
they are worth. Not only do they have strange limitations (on selection of
opcodes but also for compiler versions), but they aren't well documented,
and semantics aren't clear.

For example, if you use the "bsfl" inline assembly instruction, you know
what the semantics are and what the generated code is like: Intel
documents it, and you know what code you generated. So the special cases
like "what happens if the input is zero" are well-defined.

In contrast, the gcc builtins probably match some standard that is not
only harder to find, but also has some _other_ definition for what happens
for the zero case, so the builtins automatically end up having problems
due to semantic mis-match between the CPU and the standard.

Basic rule: inline assembly is _better_ than random compiler extensions.
It's better to have _one_ well-documented extension that is very generic
than it is to have a thousand specialized extensions.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at