Its been so long that I don't really know if this is related or not,
but ... back in about 1992, myself and some guy at Harvard tracked
down a bug in just about every BSD-derived TCP implementation that was
related to delayed/lost acks. What I recall from that time was
something like this:
host1 sends pkt seq N to host2
<packet lost somehow>
host1 waits for ack N
<not received in current expected rtt>
host1 increments rtt / timeout to host2 (typically by doubling)
host1 resends pkt seq N to host2
host2 sends ack N to host1 (promptly)
At this point, host1 erroneously concludes that the ack it just
received is (a slow response) to its first attempt to send pkt seq N
(rather than a fast response to the resend), and doesn't bother to
halve the rtt value for the connection. Each time a packet gets lost
(for whatever ever reason), the RTT keeps getting doubled, and the
link performance slows to a crawl, since retries aren't done in a
timely fashion. Eventually, the RTT gets too big, and the connection
gets dropped.
I don't know which implementations fixed this - we told Bostic and
other people at Berkelely about this (BSD was at 4.2 at that point, I
think). At that point, Solaris, SunOS, Ultrix, OSF, Mach, NCD's X
terminal OS, Tektronix X terminal OS and no doubt others all had this bug.
I will have surely gotten details of this wrong, since I've haven't
been close to TCP in at least 5 years, and I don't know if this is
relevant anymore, let alone to your particular case. Just a brain dump.
--p
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/