Re: Possible improvement to pipe throughput

David S. Miller (davem@caip.rutgers.edu)
Sat, 28 Sep 1996 14:48:05 -0400


From: Matthias Urlichs <smurf@smurf.noris.de>
Date: Fri, 27 Sep 1996 12:54:25 +0100

Damn difficult.

I never stated it would be a piece of cake ;-)

The problem is, we need to get the data _after_ all the headers into a
specific position. This would mean that the code which reads the IP frame
from the card only reads the IP header, then the TCP code reads the TCP
header and figures out where on the flip-out page to put the data. After
that, you check the TCP checksum, and if it doesn't match you undo the
effects of all of the above. :-/

I am talking "extremely high bandwidth" cases with networking hardware
that does the checksumming for you, I never stated we can ever avoid
looking at the headers, just the "bulk" of the data.

So basically, say you have a 64k (max IP packet size) TCP packets
coming in at approx. a rate of 1 gigabit, your CPU can max out at say
50MB/s memcpy(). The code path would look like:

ip_recv() {
if(verify_ip_header() || iphdr->csum)
drop_it();
ip->ops->recv(buff);
}

tcp-recv() {
if(verify_tcp_header() || thdr->csum)
do_drop_processing();
copy_first_page_of_buffer();
if(buff_len > PAGE_SIZE)
try_to_flip_remaining_pages();
}

The idea is that, you need to touch the first page of the buffer to do
normal everyday ip header and tcp header processing etc. But if the
buffer spans "a considerable number" of pages, the flip is a _big_ win
and we can do it.

This is certainly possible. Whether it's a good idea to totally rewrite
the IP stack to do it is another question entirely, and I for one would
answer it with a resounding "You must be crazy".

Using Larry McVoy's "Bulk Data Service" (BDS), IRIX can get ~80MB/s
read nfs bandwidth, and something close to ~65MB/s write bandwidth
over HIPPI. If the cpu was touching the data those numbers would be
less than half that. Crazy, but if carefully done:

1) No performance loss for "stupid" hardware and small buffers
2) Nearly double the performance for smart hardware and big
buffers

Crazy? Yes. Worth it? Definitely.

BTW, I am against something like hardware which implements TCP and IP.
That is going too far. But something well defined such as 2's
complement checksumming, because of the gain, I am all for in
networking hardware. And the above scheme is a way to truly take
advantage of it.

David S. Miller
davem@caip.rutgers.edu