Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)

From: Linus Torvalds
Date: Fri Jul 29 2005 - 11:35:42 EST

On Fri, 29 Jul 2005, David Woodhouse wrote:
> On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> > Basic rule: inline assembly is _better_ than random compiler extensions.
> > It's better to have _one_ well-documented extension that is very generic
> > than it is to have a thousand specialized extensions.
> Counterexample: FR-V and its __builtin_read8() et al.

There are arguably always counter-examples, but your arguments really are
pretty theoretical.

Very seldom does compiler extensions end up being (a) timely enough and
(b) semantically close enough to be really useful.

> Builtins can also allow the compiler more visibility into what's going
> on and more opportunity to optimise.

Absolutely. In theory. In practice, not so much. All the opportunity to
optimize often ends up being lost in semantic clashes, or just because
people can't use the extension because it hasn't been there since day one.

The fact is, inline asms are pretty rare even when we are talking about
every single possible assembly combination. They are even less common when
we're talking about just _one_ specific case of them (like something like

What does this mean? It has two results: (a) instruction-level scheduling
and register allocation just isn't _that_ important, and the generic "asm"
register scheduling is really plenty good enough. The fact that in theory
you might get better results if the compiler knew exactly what was going
on is just not relevant: in practice it's simply not _true_. The other
result is: (b) the compiler people don't end up seeing something like the
esoteric builtins as a primary thing, so it's not like they'd be tweaking
and regression-testing everything _anyway_.

So I argue very strongly that __builtin_xxx() is _wrong_, unless you have
very very strong reasons for it:

- truly generic and _very_ important stuff: __builtin_memcpy() is
actually very much worth it, since it's all over, and it's so generic
that the compiler has a lot of choice in how to do it.

- stuff where the architecture (or the compiler) -really- sucks with
inline asms, and has serious problems, and the thing is really
important. Your FR-V example _might_ fall into this category (or it
might not), and ia64 has the problem with instruction packing and
scheduling and so __builtin's have a bigger advantage.

Basically, on most normal architectures, there's seldom any reason at
_all_ to use builtins except for things like memcpy. On x86, I think the
counter-example might be if you want to schedule MMX code from C - which
is a special case because it doesn't follow my "rule (a)" above. But we
don't do that in the kernel, really, or we just schedule it out-of-line.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at