Re: [PATCH] 3 performance tweaks

From: ying@almaden.ibm.com
Date: Thu May 25 2000 - 10:12:34 EST

Next message: Kurt Roeckx: "Re: IP_ALIAS bug in 2.2.x"
Previous message: bug1: "Re: [patches] kernel timer races"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> The problem is, "hot" data doesn't directly mean L1 cache or L2 cache
> hit on a SMP system, at least in the current linux interrupt handling
> scheme.

> For example:
> Suppose data is transfered from the system to outside by a network.
> Firstly the data is copied from a user space to a malloc'ed kernel
> space on CPU-A, then a device is initiated to send. When the transfer
> completes, an interrupt happens to CPU-X and the data is free'ed on
> CPU-X.

This behavior is to be expected on an SMP anyway. In fact, on average, only
1/N (N is
the number of CPUs in an SMP) will be freed by the CPU that allocated the
space. The
rest of them should be roughly evenly spread across other CPUs. It's a
random process anyway.

Actually, I have fully implemented a per-CPU slab allocator a few weeks
ago. In my implementation
I don't lock on either malloc or free. Based on some measurements and lock
statistics, I can see that
all of the spinlocks were gone. But I can see ~3-5% performance
improvements in Netbench.
I care more about file server performance than web server performance here.

If you want, I can send out the patch that includes the SMP-slab allocator
and a paper that I wrote that describes the SMP-slab allocator. There are
some clean up needed probably. I have not taken time to comment the code.
The reap is also kinda lossy right now. But my belief is that reap is very
infrequent anyhow. So
I have not tried hard to make it run more efficiently, and I probably
won't.

> To break this scenario, there are two solutions, I think.

> 1. run the bh_handler on the initial CPU.
> 2. return the collected memory to the original slab-cache.

I don't see an easy way to do something like this, really. There are also
tradeoffs like the interrupt CPU may be less busy than the
original CPU, so which one do you want to go to?

Ying

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kurt Roeckx: "Re: IP_ALIAS bug in 2.2.x"
Previous message: bug1: "Re: [patches] kernel timer races"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Reply: kumon@flab.fujitsu.co.jp: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:14 EST