Re: very poor TCP performance with 2.2.2

Andrea Arcangeli (andrea@e-mind.com)
Fri, 5 Mar 1999 18:21:08 +0100 (CET)


On Thu, 25 Feb 1999, Pete Wyckoff wrote:

>2.2dump.txt:
>
>13.957811 client.1582 > server.3000: S 28534252:28534252(0) win 8192 <mss 1460> (DF) [tos 0x10]
>13.957907 server.3000 > client.1582: S 1635428655:1635428655(0) ack 28534253 win 32120 <mss 1460> (DF)
>13.958224 client.1582 > server.3000: . ack 1 win 8760 (DF) [tos 0x10]
>13.959084 client.1582 > server.3000: P 1:36(35) ack 1 win 8760 (DF) [tos 0x10]
>13.959154 server.3000 > client.1582: . ack 36 win 32120 (DF)
>14.008709 server.3000 > client.1582: P 1:157(156) ack 36 win 32120 (DF)
>14.151263 client.1582 > server.3000: . ack 157 win 8604 (DF) [tos 0x10]
>14.151353 server.3000 > client.1582: . 157:1617(1460) ack 36 win 32120 (DF)
>14.151520 server.3000 > client.1582: . 1617:3077(1460) ack 36 win 32120 (DF)
>14.154405 client.1582 > server.3000: . ack 3077 win 8760 (DF) [tos 0x10]
>14.154474 server.3000 > client.1582: FP 3077:4075(998) ack 36 win 32120 (DF)
>14.155811 client.1582 > server.3000: . ack 4076 win 7762 (DF) [tos 0x10]
>14.155971 client.1582 > server.3000: F 36:36(0) ack 4076 win 7762 (DF) [tos 0x10]
>14.156026 server.3000 > client.1582: . ack 37 win 32120 (DF)
>
>Here the 2.2 server immediately acks for 1:36, then this time the app takes
>50 ms to generate (maybe) and return the data. Apparently the NT box has
>gotten distracted, and takes 142 ms to notice and continue the conversation.
>Speculation: NT is doing delayed ack, and hoping for some data, hits the
>timeout, and ACKs.

The only way to join the two data packets of 2.2.2 is to delay also the
first ack.

Here the diff between my current tree and 2.2.2. It includes also DaveM
Solaris-workaround. I think it should address the problem you are
reporting above. Ah and also disable delacks if TCP_NODELAY is set because
doing that we are able to take TCP latency on the order of 10usec, but
that's unrelated to this issue.

The interesting thing for this issue is the tcp_input.c at line @@ -99,12
+102,11 @@.

NOTENOTENOTE: Don't include this in the official kernel. It should be
against RFC AFIK. It's an interesting optimization though and I agree with
it. It will save us to send some good packet on the wire.

When I was complaining about these HZ/50 values I didn't understood that
one of the most important improvement of delacks is that they avoid us to
send an ack at the receiver when the receiver is going to become the
sender.

Index: ipv4//tcp_input.c
===================================================================
RCS file: /var/cvs/linux/net/ipv4/tcp_input.c,v
retrieving revision 1.1.1.6
diff -u -r1.1.1.6 tcp_input.c
--- tcp_input.c 1999/02/23 16:48:26 1.1.1.6
+++ tcp_input.c 1999/03/05 17:14:04
@@ -55,6 +55,9 @@
* work without delayed acks.
* Andi Kleen: Process packets with PSH set in the
* fast path.
+ * Andrea Arcangeli: TCP_NODELAY disable delacks.
+ * Andrea Arcangeli: Delay also the first ack to be able
+ * to join the ack in the first data packet.
*/

#include <linux/config.h>
@@ -99,12 +102,11 @@
if(tp->ato == 0) {
tp->lrcvtime = jiffies;

- /* Help sender leave slow start quickly,
- * and also makes sure we do not take this
- * branch ever again for this connection.
+ /*
+ * Don't enter quickack mode to be able to join the first
+ * ack with a data packet. -arca
*/
- tp->ato = 1;
- tcp_enter_quickack_mode(tp);
+ tp->ato = (HZ+49)/50;
} else {
int m = jiffies - tp->lrcvtime;

@@ -135,9 +137,9 @@
*/
if(th->psh && (skb->len < (tp->mss_cache >> 1))) {
/* Preserve the quickack state. */
- if((tp->ato & 0x7fffffff) > HZ/50)
+ if((tp->ato & 0x7fffffff) > (HZ+49)/50)
tp->ato = ((tp->ato & 0x80000000) |
- (HZ/50));
+ ((HZ+49)/50));
}
}

@@ -241,7 +243,7 @@
extern __inline__ int tcp_paws_discard(struct tcp_opt *tp, struct tcphdr *th, unsigned len)
{
/* ts_recent must be younger than 24 days */
- return (((jiffies - tp->ts_recent_stamp) >= PAWS_24DAYS) ||
+ return (((s32)(jiffies - tp->ts_recent_stamp) >= PAWS_24DAYS) ||
(((s32)(tp->rcv_tsval-tp->ts_recent) < 0) &&
/* Sorry, PAWS as specified is broken wrt. pure-ACKs -DaveM */
(len != (th->doff * 4))));
@@ -1580,7 +1582,9 @@
/* We entered "quick ACK" mode or... */
tcp_in_quickack_mode(tp) ||
/* We have out of order data */
- (skb_peek(&tp->out_of_order_queue) != NULL)) {
+ (skb_peek(&tp->out_of_order_queue) != NULL) ||
+ /* TCP_NODELAY is set */
+ sk->nonagle == 1) {
/* Then ack it now */
tcp_send_ack(sk);
} else {
Index: ipv4//tcp_output.c
===================================================================
RCS file: /var/cvs/linux/net/ipv4/tcp_output.c,v
retrieving revision 1.1.1.5
diff -u -r1.1.1.5 tcp_output.c
--- tcp_output.c 1999/02/23 16:48:26 1.1.1.5
+++ tcp_output.c 1999/03/02 16:53:50
@@ -580,6 +580,19 @@
if(tp->af_specific->rebuild_header(sk))
return 1; /* Routing failure or similar. */

+ /* Solaris sucks. */
+ if(skb->len > 0 &&
+ (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) &&
+ tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
+#if 1
+ printk("TCP: Doing Solaris hack for [%p:%08x:%04x:%08x]\n",
+ skb, sk->daddr, sk->dport, tp->snd_una);
+#endif
+ TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
+ skb_trim(skb, 0);
+ skb->csum = 0;
+ }
+
/* Ok, we're gonna send it out, update state. */
TCP_SKB_CB(skb)->sacked |= TCPCB_SACKED_RETRANS;
tp->retrans_out++;
@@ -973,6 +986,8 @@
timeout = tp->ato;
if (timeout > max_timeout)
timeout = max_timeout;
+ if (timeout < (HZ+49)/50)
+ timeout = (HZ+49)/50;
timeout += jiffies;

/* Use new timeout only if there wasn't a older one earlier. */
@@ -980,7 +995,7 @@
tp->delack_timer.expires = timeout;
add_timer(&tp->delack_timer);
} else {
- if (timeout < tp->delack_timer.expires)
+ if (time_before(timeout, tp->delack_timer.expires))
mod_timer(&tp->delack_timer, timeout);
}
}

To apply it do `cd linux/net/ipv4; patch -p0 <whereisthepatch`.

Let me know if it helps.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/