Re: [PATCH 2/2] page table iterators

From: Nick Piggin
Date: Mon Feb 21 2005 - 03:11:58 EST


Benjamin Herrenschmidt wrote:

All of them are slightly differently implemented, some check overflow,
some don't, some have redudant checking, some aren't even consistent
between all 3/4 loops of a given walk routine set, and we have seen the
tendency to introduce subtle bugs in one of them when they all have to
be changed for some reason.

I'm all for turning them into something more consistent, and I like the
for_each_* idea...

It also allows to completely remove the code of the unused levels on 2
and 3 level page tables easily, regaining some of the perfs lost by the
move to 4 levels.


It appears to do even better on 2-levels (i386, !PAE) than the old
3-level code, not surprisingly. lmbench fork+exit overhead is under
100us on a 3.4GHz xeon now, which is the lowest I've seen.

Haven't yet pulled out a pre-4-level kernel to see how 3-level compares
I guess I'll do that now.

Now, we also need, in the long run, to improve perfs of walking the page
tables, especially PTEs, for things like tearing down processes or fork,
for example via a bitmap of used PGD entries etc...

With proper iterators, such a thing could be implemented just by
modifying the iterator, and all loops would benefit from it.


After looking at David's bitmap walking code, I'm starting to think
that my current macros only _just_ scrape by because of the uniform
nature of the walkers, and their relative simplicity. Anything much
more complex will start to get ugly.

I'd like to look at a slightly more involved reworking in order to
nicely support optimisations like bitmap walking, without blowing out
the complexity of the macros and without hiding too much of the
workings.

However, my main aim for these macros was mainly to fix the
performance regressions on 2 and 3 level architectures. Ben's
complaints about these loops just served to hurry it along. I think
that these reasons (performance, code consistency) make it a good
idea.

Nick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/