Re: [PATCH] 3 performance tweaks

From: Mark Hemment (Mark_Hemment@eur.3com.com)
Date: Thu May 25 2000 - 04:50:56 EST

Next message: Andrew Morton: "[patches] kernel timer races"
Previous message: Sean Hunter: "Re: Announcing CML2, a replacement for the kbuild system"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: Manfred Spraul: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

If you're interested in my latest SLAB allocator implementation, look at;
http://www.nextd.demon.co.uk/slab.c

This is simply a snapshot (it won't even compile, and other bits of code needed
are missing), but if you have the time to read the code it will give you an idea
of what I'm thinking - you'll have to excuse any mis-leading comments, bugs,
etc, this code hasn't run for a long time, and needs much more work/tidying up.

Some of my goals are;
     o Return L1 cache "hot" objects when possible.
     o Lightweight allocation/release paths, with small L1 footprint.
     o SMP fastpath takes no locks.
     o Ability to add new general size caches on the fly (but without the need
for locking the list of general caches when allocating - ie. the search needs no
locks, or even blocking of interrupts).
     o Ability to remove SLAB caches (execpt for general sized caches).
     o Good support for DMA allocations (via sub-caches, which I hope to expand
upon when moving to 2.4.x (current development is against 2.2.15)).
     o Tries hard to avoid greater than 0 page order requests to the page
allocator.
     o Keep both internal and external memory fragmentation low.
     o Good debugging support.
     o Well documented, and clean, code.

I've got a lot of work to do on this; it isn't meant to work at the moment, so
no bug reports please.

Mark

I've already experimented my implementationo of per-cpu slab-cache.
By adding a limitted depth stack for each CPU in front of the slab
allocating logic . When a block is freed, it is put onto the stack
unless the depth is less than the limit, if it exceeds, the bottom of
the stack is returnd to slab allocating logic.

I used 32 entry stack and also limits the depth by the by cache amount
for the slab, in the experiment, the size limit is 32KB.

The result is very successfull, for the most of slab-types, the caches
hit more than 90% of the request, and also the Webbench gains about 5%
of speedup. Three are few slabs showing no cache gain, for example:
skbuf_head_cache, vm_area_struct.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "[patches] kernel timer races"
Previous message: Sean Hunter: "Re: Announcing CML2, a replacement for the kbuild system"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: Manfred Spraul: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:13 EST