Re: [patch] TCP/IP delacks disabled with MPI

Josip Loncaric (josip@icase.edu)
Fri, 21 May 1999 10:12:54 -0400


Andrea Arcangeli wrote:
>
> [...]
>
> One of the thing I noticed from the tcp traces sent to me from Josip, is
> that currently we send the ack only if ATO expires or we received more
> than 2*MSS data but we don't care about the exact _number_ of not acked
> packets we have in the receive queue.

My understanding is that Linux TCP incorrectly handles the slow start
algorithm because it does not count the _number_ of packets that each
ACK represents. This may be at the root of the problem we see, and
perhaps explains why immediately ACKing every packet fixes the problem.
Unfortunately, I cannot test 2.2.9 w/Andrea's patch because our system
is now in use, and we've got to stick with 2.0.36 until at least
September (for a variety of reasons).

BTW, rfc2414 suggests that the initial congestion window can be 3 or
even 4, and that this is a good minimum value to use after being idle
for a while. By default, TCP resets congestion window to 1 after being
idle for an entire RTO. For our internal use only when TCP_NODELAY is
set, we use exponential backoff to a minimum cong_window of 3, and
various timeout constants are up to ten times lower (for faster
recovery). Normal TCP operation is maintained for sockets without
TCP_NODELAY.

Our Beowulf-class cluster has demonstrated that Linux TCP is simply not
tuned for highly interactive interprocess communication. We need Fast
Ethernet latencies well under 100 microseconds, 90+Mbit/s bandwidth, and
responsiveness generally 10-1000 times better than what standard TCP
uses need. Congestion on our switched fast ethernet network is very
rare, and the sockets mostly behave as if direct full duplex crossover
cables were used. Apart from the computational cost imposed by ACKing
every packet, our modified TCP comes close to what we need. I hope that
a better fix will become available, but in the near future we'd also
like to try M-VIA which promises latencies about three times better than
standard TCP.

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip@icase.edu
ICASE, Mail Stop 132C                       http://www.icase.edu/~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/