Re: 23 second kernel compile (aka which patches help scalibility on NUMA)

From: Andrea Arcangeli (andrea@suse.de)
Date: Sun Mar 10 2002 - 21:14:25 EST


On Fri, Mar 08, 2002 at 09:47:04PM -0800, Martin J. Bligh wrote:
> Big locks left:
>
> pagemap_lru_lock
> 20.2% 57.1% 5.4us( 86us) 111us( 16ms)(14.7%) 1014988 42.9% 57.1% 0%

I think this is only due the lru_cache_add executed by the anonymous
pagefaults. Pagecache should stay in the lru constantly if you're
running in hot pagecache as I guess. For a workload like this one
keeping anon pages out of the lru would be an obvious win. The only
reason we put anon pages into the lru before they are converted to
swapcache is to get a nicer swapout behaviour, but you're certainly not
swapping out anything. It's a tradeoff. Just like the additional
memory/cpu and locking overhead that rmap requires will slowdown page
faults even more than what you see now, with the only object to get a
nicer pagout behaviour (modulo the ram-binding "migration" stuff where
rmap is mandatory to do it instantly and not over time). If we don't
care of getting a nice swapout behaviour workloads like a kernel compile
could be speededup and scaled up much better, but for general purpose we
don't want to slowdown like a crawl when swapping activities become
necessary.

> Any other suggestions are welcome. I'd also be interested
> to know if 23s is fast for make bzImage, or if other big
> iron machines can kick this around the room.

It's also a big function of .config, compiler and kernel source (and if
you include make dep too or not).

>From my part my record kernel compile is been at LANL with 32cpus
wildfire with a 2.4.3-aa kernel IIRC (it just had the basic numa
scheduler optimizations), it took 37 seconds IIRC (with a quite generic
.config that could be used on most alphas except for the
CONFIG_WILDFIRE that was required at that time, but not all the possible
drivers out there included of course). With Ingo's scheduler and the
other enachements that happend during 2.4, it probably won't go down to
the teenth, but it should get into the low twenty I believe. Also 32way
scalability on a kernel compile cannot be exploited completly, unless
the .config is very full like the one used by distributions. Last but
not the least the output was scrolling so fast on the VGA console that I
guess redirecting the output to >/dev/null may save some point percent
too :). and of course that was with an alpha target, not an x86 target,
so that's not comparable also because of last variable. I think your
23 seconds figure looks very nice.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 15 2002 - 22:00:14 EST