Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)

From: Linus Torvalds
Date: Thu Jul 28 2005 - 14:00:59 EST

On Thu, 28 Jul 2005, Steven Rostedt wrote:
> OK, I guess when I get some time, I'll start testing all the i386 bitop
> functions, comparing the asm with the gcc versions. Now could someone
> explain to me what's wrong with testing hot cache code. Can one
> instruction retrieve from memory better than others?

There's a few issues:

- trivially: code/data size. Being smaller automatically means faster if
you're cold-cache. If you do cycle tweaking of something that is
possibly commonly in the L2 cache or further away, you migt as well
consider one byte of code-space to be equivalent to one cycle (a L1 I$
miss can easily take 50+ cycles - the L1 fill cost may be just a small
part of that, but the pipeline problem it causes can be deadly).

- branch prediction: cold-cache is _different_ from hot-cache. hit-cache
predicts the stuff dynamically, cold-cache has different rules (and it
is _usually_ "forward predicts not-taken, backwards predicts taken",
although you can add static hints if you want to on most architectures).

So hot-cache may look very different indeed - the "normal" case might
be that you mispredict all the time because the static prediction is
wrong, but then a hot-cache benchmark will predict perfectly.

- access patterns. This only matters if you look at algorithmic changes.
Hashes have atrocious locality, but on the other hand, if you know that
the access pattern is cold, a hash will often have a minimum number of

but no, you don't have "some instructions are better at reading from
memory" for regular integer code (FP often has other issues, like reading
directly from L2 without polluting L1, and then there are obviously
prefetch hints).

Now, in the case of your "rep scas" conversion, the reason I applied it
was that it was obviously a clear win (rep scas is known bad, and has
register allocation issues too), so I'm _not_ claiming that the above
issues were true in that case. I just wanted to say that in general it's
nice (but often quite hard) if you can give cold-cache numbers too (for
example, using the cycle counter and being clever can actually give that).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at