Re: [rfc] Near-constant time directory index for Ext2

From: Ralph Loader (suckfish@ihug.co.nz)
Date: Fri Feb 23 2001 - 16:43:16 EST


Hi,

I while ago I did some experimentation with simple bit-op based string
hash
functions. I.e., no multiplications / divides in the hash loop.

The best I found was:

int hash_fn (char * p)
{
  int hash = 0;
  while (*p) {
     hash = hash + *p;
     // Rotate a 31 bit field 7 bits:
     hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
  }
  return hash;
}

[I haven't kept my test program / data set - if anyone compares the
above
 to the others functions mentioned on the list, let me know.]

The 31 and 7 were determined experimentally. But the 31 has a
theoretical
explanation (which might even be valid):

The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],
where P is the polynomial x**31 - 1 (over Z_2).
Presumably the "best" P would be irreducible - but that would have more
bits set in the polynomial, making reduction harder. A compromise is to
choose P in the form x**N - 1 but with relatively few factors.
X**31 - 1 is such a P.

Also, a 32 bit rotate (modulo X**32 - 1, which is equal
to (X - 1) ** 32 over Z_2), came out pretty badly.

One thing that shouldn't be forgotten about hashing for hash tables
is that you have to reduce the hash value to the required range - doing
that well greatly reduces the difference between various hash functions.

Ralph.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 23 2001 - 21:00:31 EST