Andries,
> > int hash_fn (char * p)
> > {
> > int hash = 0;
> > while (*p) {
> > hash = hash + *p;
> > // Rotate a 31 bit field 7 bits:
> > hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
> > }
> > return hash;
> > }
>
> Hmm. This one goes in the "catastrophic" category.
>
> For actual names:
>
> N=557398, m=51 sz=2048, min 82, max 4002, av 272.17, stddev 45122.99
>
> For generated names:
>
> N=557398, m=51 sz=2048, min 0, max 44800, av 272.17, stddev 10208445.83
>
Instead of masking the hash value down to 11 bits you could try:
index = (hash ^ (hash >> 11) ^ (hash >> 22)) & 0x7ff;
I ran a quick test which gave fairly good results with that: 12871
identifiers
from a source tree) gave a mean square bucket size of 45.65, expected
value for a random function is 45.78.
That change might improve some of your other hashes as well - there
doesn't
seem to be much point in computing a 32 bit value only to throw 20 bits
away -
stirring in the extra bits makes much more sense to me.
> > The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],
> > where P is the polynomial x**31 - 1 (over Z_2).
> > Presumably the "best" P would be irreducible - but that would have more
> > bits set in the polynomial, making reduction harder. A compromise is to
> > choose P in the form x**N - 1 but with relatively few factors.
> > X**31 - 1 is such a P.
>
> It has seven irreducible factors. Hardly "almost irreducible".
I didn't say it was. "almost irreducible" polynomials with Hamming
weight two are pretty rare... Relative to say, x**32 - 1 or x**24 - 1,
having 7 factors is good.
Ralph.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Feb 28 2001 - 21:00:07 EST