Re: [PATCH 3/4] add ksm kernel shared memory driver

From: Valdis . Kletnieks
Date: Tue Nov 11 2008 - 17:40:43 EST


On Tue, 11 Nov 2008 15:03:45 MST, Jonathan Corbet said:

> > +#define PAGECMP_OFFSET 128
> > +#define PAGEHASH_SIZE (PAGECMP_OFFSET ? PAGECMP_OFFSET : PAGE_SIZE)
> > +/* hash the page */
> > +static void page_hash(struct page *page, unsigned char *digest)
>
> So is this really saying that you only hash the first 128 bytes, relying on
> full compares for the rest? I assume there's a perfectly good reason for
> doing it that way, but it's not clear to me from reading the code. Do you
> gain performance which is not subsequently lost in the (presumably) higher
> number of hash collisions?

Seems reasonably sane to me - only doing the first 128 bytes rather than
a full 4K page is some 32 times faster. Yes, you'll have the *occasional*
case where two pages were identical for 128 bytes but then differed, which is
why there's buckets. But the vast majority of the time, at least one bit
will be different in the first part.

In fact, I'd not be surprised if only going for 64 bytes works well...

Attachment: pgp00000.pgp
Description: PGP signature