Re: zero-copy TCP

From: Jes Sorensen (jes@linuxcare.com)
Date: Sat Sep 02 2000 - 16:40:18 EST


>>>>> "Jeff" == Jeff V Merkey <jmerkey@timpanogas.com> writes:

Jeff, could you start by learning to quote email and not send a full
copy of the entire email you reply to (read rfc1855).

Jeff> The entire Linux Network subsystem needs an overhaul. The code
Jeff> copies data all over the place. I am at present pulling it apart
Jeff> and porting it to MANOS, and what a mess indeed. In NetWare, the
Jeff> only time data ever gets copied from incoming packets is:

Try and understand the code before you make such bold statements.

Jeff> 1. A copy to userspace at a stream head. 2. An incoming write
Jeff> that gets copied into the file cache.

Jeff> Reads from cache are never copied. In fact, the network server
Jeff> locks a file cache page and sends it unaltered to the network
Jeff> drivers and DMA's directly from it. Since NetWare has WTD's
Jeff> these I/O requests get processed at the highest possible
Jeff> priority. In networking, the enemy is LATENCY for fast
Jeff> performance. That's why NetWare can handle 5000 users and Linux
Jeff> barfs on 100 in similiar tests. Copying increases latency, and
Jeff> the long code paths in the Linux Network layer.

You can't DMA directly from a file cache page unless you have a
network card that does scatter/gather DMA and surprise surprise,
80-90% of the cards on the market don't support this. Besides that you
need to do copy-on-write if you want to be able to do zero copy on
write() from user space, marking data copy on write is *expensive* on
x86 SMP boxes since you have to modify the tlb on all
processors. On top of that you have to look at the packet size, for
small packets a copy is often a lot cheaper than modifying the page
tables, even on UP systems so you need a copy/break scheme here.

As wrt your statement on latency then it's nice to see that you don't
know what you are talking about. Latency is one issue in fast
networking it's far from the only one. Latency is important for
message passing type applications however for bulk data transfers it's
less relevant since you really want deep pipelining here and properly
written applications. If you TCP window is too small even zero latency
will only buy you soo much on a really fast network.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:14 EST