Re: [PATCH 1/3] X86: Optimise fls(), ffs() and fls64()

From: Linus Torvalds
Date: Fri Mar 26 2010 - 13:28:39 EST




On Fri, 26 Mar 2010, David Howells wrote:
>
> fls(N), ffs(N) and fls64(N) can be optimised on x86/x86_64. Currently they
> perform checks against N being 0 before invoking the BSR/BSF instruction, or
> use a CMOV instruction afterwards. Either the check involves a conditional
> jump which we'd like to avoid, or a CMOV, which we'd also quite like to avoid.
>
> Instead, we can make use of the fact that BSR/BSF doesn't modify its output
> register if its input is 0. By preloading the output with -1 and incrementing
> the result, we achieve the desired result without the need for a conditional
> check.

This is totally incorrect.

Where did you find that "doesn't modify its output" thing? It's not true.
The truth is that the destination is undefined. Just read the dang Intel
documentation, it's very clearly stated right there.

If you can show otherwise, feel free. But I'm pretty sure there are
actually x86 chips out there that _do_ modify the destination. I have a
pretty strong memory of us trying this at some point, and it not working.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/