Re: [PATCH] 3 performance tweaks

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Thu May 25 2000 - 06:43:01 EST


kumon@flab.fujitsu.co.jp wrote:
> Suppose data is transfered from the system to outside by a network.
> Firstly the data is copied from a user space to a malloc'ed kernel
> space on CPU-A, then a device is initiated to send. When the transfer
> completes, an interrupt happens to CPU-X and the data is free'ed on
> CPU-X.

> To break this scenario, there are two solutions, I think.
>
> 1. run the bh_handler on the initial CPU.
> 2. return the collected memory to the original slab-cache.
>
> Does anybody have ideas?

3. Provide a kfree_from_cpu function which specific code such as the
   network buffer handlers can call. A short term solution at least.

But the real solution:

The initial data is usually transferred by DMA into central memory --
it's not in anyone's L1 or L2 cache. Only the header is read by the
kernel. (I'm not sure if snooping on DMA into a previously L2 hot
memory region wrecks this assumption). This means the source CPU of the
copy-and-maybe-checksum isn't important.

If you're going to copy the data to user space, you'd like the
destination of the copy on the same CPU as the user space task will use.

So the most important decision is which CPU to run the copy on -- and
that should be the one that will run the user space task that reads the
copied data. (Unless we get zero copy reads -- then it doesn't matter).

So ideally, the copy should be done just before waking up the task
that's waiting for that read() to complete, and on the same CPU.

The obvious way to do that is to move more of the network stack into
process context. Note that many modern NICs will do IP checksums, so
almost all of the network stack can remain in BH context.

So only the copy to user space has to happen in process context for this
SMP optimisation, if you tune for a checksumming NIC.

In case your NIC doesn't do checksums, you can move more of the network
stack into process context. That's already been proposed for 2.5 as it
has other advantages. Or, keeping the stack in BH, do the copy there
and when waking up a process that's waiting on socket read(), set that
task's CPU affinity to be the CPU that copied the socket's data.

Combinations and variations are possible.

enjoy,
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:14 EST