Re: [patch] TCP/IP delacks disabled with MPI

Josip Loncaric (josip@icase.edu)
Fri, 21 May 1999 12:05:02 -0400


Andrea Arcangeli wrote:
>
> On Fri, 21 May 1999, Josip Loncaric wrote:
>
> >My understanding is that Linux TCP incorrectly handles the slow start
> >algorithm because it does not count the _number_ of packets that each
> >ACK represents. [..]

My information is second hand, but as far as I can tell, Linux TCP
handles congestion window according to the _amount_ of acknowledged data
regardless of the _number_ of packets these ACKs represent. This breaks
the slow start algorithm, which is supposed to exponentially open the
congestion window. Linux TCP opens it linearly when packets are small
(much less than MSS).

Slow start should grow cwnd 1->2->4->8->... even with smallest packets:

cwnd=1 pkt->
<- ack 1 pkt
cwnd=2 pkt->
pkt->
<- ack 2 pkts
cwnd=4 pkt->
pkt->
pkt->
pkt->
<- ack 4 pkts
cwnd=8 ...

But as far as I can tell, Linux TCP handles a received ACK by increasing
cwnd by one, not by the number of packets that this ACK represents.
Thus, cwnd grows 1->2->3->4->... instead of exponentially. This could
be a source of problems, although there are other contributing factors
as well. By the way, at least one ACK must be returned for every 2*MSS
bytes received, so the above reasoning applies to small packets only.

Some time constants in Linux TCP are way too long. Linux TCP estimates
of RTO are kept 200ms or longer, although for us 20ms is more
appropriate. This 200ms minimum implies huge windows for Fast Ethernet
connections (20Mbit or 2.5Mbytes) which we cannot tolerate. Even 20ms
min. RTO (->250kB window) is too much, particularly since the standard
TCP socket buffer size is only 32kB. We'd prefer 2ms min. RTO in some
cases.

Delayed ACK timeouts of 500ms are standard but about 1000 times longer
than what we'd like. The 1 second retransmit timer which applies when
sk->users is set needs to be at least 10-100 times faster. Also, there
is another 30 second partial packet timer in TCP, which seems completely
out of line with our needs.

My main point is this: we need highly interactive request-response
communication between processes with sub-millisecond response times.
The hope of piggy-backing ACKs on the return traffic is usually
misplaced, although sometimes it would help reduce the workload.

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip@icase.edu
ICASE, Mail Stop 132C                       http://www.icase.edu/~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/