From: "Michael S. Tsirkin" <mst@xxxxxxxxxxxxxx>
Date: Wed, 8 Mar 2006 14:53:11 +0200
What I was trying to figure out was, how can we re-enable the trick
without hurting TSO? Could a solution be to simply look at the frame
size, and call tcp_send_delayed_ack if the frame size is small?
The change is really not related to TSO.
By reverting it, you are reducing the number of ACKs on the wire, and
the number of context switches at the sender to push out new data.
That's why it can make things go faster, but it also leads to bursty
TCP sender behavior, which is bad for congestion on the internet.
When the receiver has a strong cpu and can keep up with the incoming
packet rate very well and we are in an environment with no congestion,
the old code helps a lot. But if the receiver is cpu limited or we
have congestion of any kind, it does exactly the wrong thing. It will
delay ACKs a very long time to the point where the pipe is depleted
and this kills performance in that case. For congested environments,
due to the decreased ACK feedback, packet loss recovery will be
extremely poor. This is the first reason behind my change.
The behavior is also specifically frowned upon in the TCP implementor
community. It is specifically mentioned in the Known TCP
Implementation Problems RFC2525, in section 2.13 "Stretch ACK
violation".
The entry, quoted below for reference, is very clear on the reasons
why stretch ACKs are bad. And although it may help performance for
your case, in congested environments and also with cpu limited
receivers it will have a negative impact on performance. So, this was
the second reason why I made this change.
So reverting the change isn't really an option.
Name of Problem
Stretch ACK violation
Classification
Congestion Control/Performance
Description
To improve efficiency (both computer and network) a data receiver
may refrain from sending an ACK for each incoming segment,
according to [RFC1122]. However, an ACK should not be delayed an
inordinate amount of time. Specifically, ACKs SHOULD be sent for
every second full-sized segment that arrives. If a second full-
sized segment does not arrive within a given timeout (of no more
than 0.5 seconds), an ACK should be transmitted, according to
[RFC1122]. A TCP receiver which does not generate an ACK for
every second full-sized segment exhibits a "Stretch ACK
Violation".
Significance
TCP receivers exhibiting this behavior will cause TCP senders to
generate burstier traffic, which can degrade performance in
congested environments. In addition, generating fewer ACKs
increases the amount of time needed by the slow start algorithm to
open the congestion window to an appropriate point, which
diminishes performance in environments with large bandwidth-delay
products. Finally, generating fewer ACKs may cause needless
retransmission timeouts in lossy environments, as it increases the
possibility that an entire window of ACKs is lost, forcing a
retransmission timeout.
Implications
When not in loss recovery, every ACK received by a TCP sender
triggers the transmission of new data segments. The burst size is
determined by the number of previously unacknowledged segments
each ACK covers. Therefore, a TCP receiver ack'ing more than 2
segments at a time causes the sending TCP to generate a larger
burst of traffic upon receipt of the ACK. This large burst of
traffic can overwhelm an intervening gateway, leading to higher
drop rates for both the connection and other connections passing
through the congested gateway.
In addition, the TCP slow start algorithm increases the congestion
window by 1 segment for each ACK received. Therefore, increasing
the ACK interval (thus decreasing the rate at which ACKs are
transmitted) increases the amount of time it takes slow start to
increase the congestion window to an appropriate operating point,
and the connection consequently suffers from reduced performance.
This is especially true for connections using large windows.