Re: [PATCH] make radix tree gang lookup faster by using a bitmapsearch

From: Andrew Morton
Date: Sat Aug 27 2005 - 12:56:08 EST


James Bottomley <James.Bottomley@xxxxxxxxxxxx> wrote:
>
> The current gang lookup is rather naive and slow.

I'd say the main naivety in gang lookup is the awkward top-level iteration
algorithm. The way it bales out all the way to the top level of the tree
once __lookup() hits the end of the slots[] array, even though results[]
isn't full yet. It's surely possible to go back up the tree just a
sufficient distance to resume the iteration, rather than all the way to the
top. But it's hard, and it's all in CPU cache anyway there.

It would be much simpler if it was using recursion, of course.

> This patch replaces
> the integer count with an unsigned long representing the bitmap of
> occupied elements. We then use that bitmap to find the first occupied
> entry instead of looping over all the entries from the beginning of the
> radix node.

But only in __lookup(). I think most gang lookups use
radix_tree_gang_lookup_tag() -> __lookup_tag().

And __lookup_tag() could use find_next_bit() on the tags anyway, as the
comment says. I spent a bit of time doing that, had some bug, shelved it,
never got on to fixing it up.

There's a userspace test/devel setup at
http://www.zip.com.au/~akpm/linux/patches/stuff/rtth.tar.gz, btw.

> The penalty of doing this is that on 32 bit machines, the size of the
> radix tree array is reduced from 64 to 32 (so an unsigned long can
> represent the bitmap).

If we did the bitmap lookup in __lookup_tag() we wouldn't have this
restriction.

Maybe we can

a) fix radix_tree_gang_lookup() to use find_next_bit()

b) remove radix_tree_node.count

c) Add a new tag field which simply means "present"

d) remove radix_tree_gang_lookup() and __lookup() altogether

e) Implement radix_tree_gang_lookup() via radix_tree_gang_lookup_tag()

That would involve setting and clearing bit in the "present" tag field when
adding and removing items.

> I also exported radix_tree_preload() so modules can make use of radix
> trees.

uh, OK. Note that radix_tree_preload() uses prempt_disable() protection.
So it has the limitation that the guarantee which it provides will become
unreliable, kernel-wide, if anyone anywhere tries to do a
radix_tree_insert() from interrupt context.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/