Re: [PATCH] dcache: better name hash function

From: Linus Torvalds
Date: Tue Oct 27 2009 - 19:42:29 EST




On Tue, 27 Oct 2009, Stephen Hemminger wrote:
>
> Going back to basics. Run tests across different input sets.
> Dropping off the slow ones like crc, md5, ...
> Not using jhash because it doesn't have good character at a time
> interface.
>
> Also, the folding algorithm used matters. Since the kernel
> already uses hash_long() to fold back to N bits, all the
> tests were rerun with that.

Yeah, the 'hash_long()' folding matters for anything that doesn't multiply
big some big number to spread the bits out, because otherwise the bits
from the last character hashed will always be in the low bits.

That explains why our current hash looked so bad with your previous code.

>From your numbers, I think we can dismiss 'kr_hash()' as having horrible
behavior with names like pppXXX (and that isn't just a special case: it's
also noticeably worse for your /home directory case, which means that the
bad behavior shows up in practice too, not just in some special cases).

'elf' and 'pjw' don't have quite the same bad case, but the stddev for the
pppXXX cases are still clearly worse than the other alternatives. They
also seem to always be slower than what we already have.

The 'fnv32' algorithm gets fairly good behavior, but seems bad on Itanium.
Looks like it depends on a fast multiplication unit, and all even your
"slow" ULV chip seems to be a Core2 one, so all your x86 targets have
that. And our current name hash still actually seems to do better in all
cases (maybe I missed some case) even if fnv32 is slightly faster on x86.

>From your list 'string10' seems to get consistently good results and is at
or near the top of performance too. It seems to be the one that
consistently beats 'full_name_hash()' both in performance and in behavior
(string_hash17/31 come close, but aren't as clearly better performing).

But I haven't actually seen the hashes. Maybe there's something that makes
string10 bad?

Regardless, one thing your new numbers do say is that our current hash
really isn't that bad.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/