Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)

From: Linus Torvalds
Date: Thu Jul 28 2005 - 10:39:24 EST




On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
> (both from versions of Debian unstable). I wrote a benchmark (attached)
> that runs the code 1,000,000 times.

I suspect the old "rep scas" has always been slower than
compiler-generated code, at least under your test conditions. Many of the
old asm's are actually _very_ old, and some of them come from pre-0.01
days and are more about me learning the i386 (and gcc inline asm).

That said, I don't much like your benchmarking methodology. I suspect that
quite often, the code in question runs from L2 cache, not in a tight loop,
and so that "run a million times" approach is not necessarily the best
one.

I'll apply this one as obvious: I doubt the compiler generates bigger code
or has any real downsides, but I just wanted to say that in general I just
wish people didn't always time the hot-cache case ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/