Re: Weird tcp performance differences with 2.0 and 2.2 kernels

Pete Wyckoff (pw@dancer.ca.sandia.gov)
Thu, 11 Feb 1999 13:25:20 -0800


boyns@contigo.com wrote in <7femnzca6b.fsf@contigo.com>:

> We're trying to figure out some weird performance differences between
> 2.0.x / 2.2.x kernels and the interface being used. One of the most
> noticeable differences is that 2.0 kernels seem to be able to
> communicate much faster over the loopback interface. Another
> interesting thing is that 2.2 kernels seem to communicate with each
> other faster when the data being written is >= 724 bytes.

While I can't say anything about the changes to the loopback performance,
I do notice similar oddities regarding small messages and delayed acks.

> remote client linux 2.2.1, 400mhz Pentium II, 3c905B Cyclone 100baseTx
> 10 bytes 46.51 c/s
> 723 bytes 47.68 c/s
> 724 bytes 279.62 c/s
> 1024 bytes 300.92 c/s

Delayed acks are a good thing (see the RFCs), but linux tries to be clever
during slow start to help ramp transfer rates up faster by using "quick
ack mode". See tcp_delack_estimator(). But the function tcp_remember_ack()
seems to be messing things up for small packets. It tries to speed up
pending acknowledgements when it sees PSH and small packets, but in so
doing, erases the "quick ack mode" bit.

Here's a patch to undo that:

diff -ruN stock-2.2.1/net/ipv4/tcp_input.c linux-2.2.1/net/ipv4/tcp_input.c
--- stock-2.2.1/net/ipv4/tcp_input.c Fri Jan 22 09:16:09 1999
+++ linux-2.2.1/net/ipv4/tcp_input.c Thu Feb 11 11:30:59 1999
@@ -132,9 +132,16 @@

/* Tiny-grams with PSH set make us ACK quickly.
* Note: This also clears the "quick ack mode" bit.
+ * Don't clear that bit if it's set already. --pw
*/
- if(th->psh && (skb->len < (tp->mss_cache >> 1)))
- tp->ato = HZ/50;
+ if(th->psh && (skb->len < (tp->mss_cache >> 1))) {
+ if (tcp_in_quickack_mode(tp)) {
+ tp->ato = HZ/50;
+ tcp_enter_quickack_mode(tp);
+ } else {
+ tp->ato = HZ/50;
+ }
+ }
}

/* Called to compute a smoothed rtt estimate. The data fed to this

Can anyone comment on why the above "Note:" was already in the code? Is
there a good reason for taking us out of quickack mode? By the way, "small"
here works out to be 724 bytes for ethernet-ish MTUs.

Two packet dumps from the client program of Boyns', first with 723 bytes
payload, then with 724. Timestamps are relative, in us, to the start of
the connection, but there's a large granularity here. The 20 ms produced
by the HZ/50 delay is clearly shown.

0 su2cn10.1033 > su2sn.7890: S 2590564679:2590564679(0) win 32120 <mss 1460,sackOK,timestamp 2523810[|tcp]> (DF)
0 su2sn.7890 > su2cn10.1033: S 2307198861:2307198861(0) ack 2590564680 win 32120 <mss 1460,sackOK,timestamp 1259949129[|tcp]> (DF)
0 su2cn10.1033 > su2sn.7890: . ack 1 win 32120 <nop,nop,timestamp 2523810 1259949129> (DF)
0 su2cn10.1033 > su2sn.7890: P 1:5(4) ack 1 win 32120 <nop,nop,timestamp 2523810 1259949129> (DF)
977 su2sn.7890 > su2cn10.1033: P 1:724(723) ack 5 win 32120 <nop,nop,timestamp 1259949129 2523810> (DF)
20528 su2cn10.1033 > su2sn.7890: . ack 724 win 32120 <nop,nop,timestamp 2523831 1259949129> (DF)
20528 su2sn.7890 > su2cn10.1033: F 724:724(0) ack 5 win 32120 <nop,nop,timestamp 1259949149 2523831> (DF)
20528 su2cn10.1033 > su2sn.7890: . ack 725 win 32120 <nop,nop,timestamp 2523831 1259949149> (DF)
20528 su2cn10.1033 > su2sn.7890: F 5:5(0) ack 725 win 32120 <nop,nop,timestamp 2523831 1259949149> (DF)
20528 su2sn.7890 > su2cn10.1033: . ack 6 win 32120 <nop,nop,timestamp 1259949150 2523831> (DF)

0 su2cn10.1034 > su2sn.7890: S 2584748716:2584748716(0) win 32120 <mss 1460,sackOK,timestamp 2527417[|tcp]> (DF)
0 su2sn.7890 > su2cn10.1034: S 2311722158:2311722158(0) ack 2584748717 win 32120 <mss 1460,sackOK,timestamp 1259952736[|tcp]> (DF)
0 su2cn10.1034 > su2sn.7890: . ack 1 win 32120 <nop,nop,timestamp 2527417 1259952736> (DF)
0 su2cn10.1034 > su2sn.7890: P 1:5(4) ack 1 win 32120 <nop,nop,timestamp 2527417 1259952736> (DF)
977 su2sn.7890 > su2cn10.1034: P 1:725(724) ack 5 win 32120 <nop,nop,timestamp 1259952736 2527417> (DF)
977 su2cn10.1034 > su2sn.7890: . ack 725 win 32120 <nop,nop,timestamp 2527418 1259952736> (DF)
977 su2sn.7890 > su2cn10.1034: F 725:725(0) ack 5 win 32120 <nop,nop,timestamp 1259952737 2527418> (DF)
977 su2cn10.1034 > su2sn.7890: . ack 726 win 32120 <nop,nop,timestamp 2527418 1259952737> (DF)
977 su2cn10.1034 > su2sn.7890: F 5:5(0) ack 726 win 32120 <nop,nop,timestamp 2527418 1259952737> (DF)
977 su2sn.7890 > su2cn10.1034: . ack 6 win 32120 <nop,nop,timestamp 1259952737 2527418> (DF)

---------------------------------------------
Pete Wyckoff | wyckoff@ca.sandia.gov
Sandia National Labs | 925 294 3503 (voice)
MS 9011, P.O. Box 969 | 925 294 1225 (fax)
Livermore, CA 94551 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/