Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)

From: Steven Rostedt
Date: Thu Jul 28 2005 - 11:16:53 EST

On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote:
> I suspect the old "rep scas" has always been slower than
> compiler-generated code, at least under your test conditions. Many of the
> old asm's are actually _very_ old, and some of them come from pre-0.01
> days and are more about me learning the i386 (and gcc inline asm).

I've been playing with different approaches, (still all hot cache
though), and inspecting the generated code. It's not that the gcc
generated code is always better for the normal case. But since it sees
more and everything is not hidden in asm, it can optimise what is being
used, and how it's used.

> That said, I don't much like your benchmarking methodology. I suspect that
> quite often, the code in question runs from L2 cache, not in a tight loop,
> and so that "run a million times" approach is not necessarily the best
> one.

Well, I never said I was a test benchmark writer :-). If you know of a
better way to benchmark these, then let me know. I also thought that
having all in a hot cache could help with showing the differences. But
I guess I would need to test this in other ways.
> I'll apply this one as obvious: I doubt the compiler generates bigger code
> or has any real downsides, but I just wanted to say that in general I just
> wish people didn't always time the hot-cache case ;)

I've just finished a version of find_first_zero_bit too. It has the
same comparisons as the find_first_bit but not as drastic. Do you want
this too, and if so, as a separate patch on top of the first one, or
against (that's the kernel I'm working with right now) or do
you want me to submit a new patch with both changes?

-- Steve

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at