Re: Skyhigh retransmit times. Yearold problem still in 2.3.26

Andi Kleen (ak@muc.de)
Mon, 15 Nov 1999 02:48:30 +0100


On Sun, Nov 14, 1999 at 11:02:17PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > > Not if TCP does not retry for the duration of your "connected" window,
> > > say 5 minutes...
> >
> > It does. The retransmit timeout is clamped at 120s, when there is anything
> > to send, and no timeout when there is nothing to send (except for keepalives,
> > but these are only turned on by some applications)
> >
> > > Also, I'm not convinced TCP resets the timeouts when they are so large
> > > so much as takes them back down slowly. This is useless if your
> > > connection disappeared for a few minutes and then TCP takes longer than
> > > you have left to recover.
> >
> > It does. The rtt estimate TCP keeps is independent from the retransmit
> > timeout (tp->rto vs tp->srtt). The rtt estimate only changes based on
> > new incoming acks, and only for ACKs where TCP knows that the packets
> > causing them have not been retransmitted.
>
> Hi Andi. Please can you explain the behaviour I get? I have a modem
> which will connect for about 2 minutes at a time. Reconnecting takes
> about 2 minutes also -- the callback PPP negotiation sequence is not
> fast.
>
> I connect and establish an ssh connection (which uses keepalives). All
> goes well until the modem loses the connection. It always does this
> just when I've typed something and the response hasn't come back. The
> modem is back 2 minutes later. Of course, there is no hope of the
> session continuing -- the retransmits are too far apart.

Too far apart? Once a retransmit succeeds TCP will go into regular
slow start again and the retransmit backoff is reset.

>
> If they are clamped at 120s, should I expect to be able to continue a
> session about half the time? I find about 1 in 30 sessions will
> continue.

You should be able to continue the session all the time, assuming it
wasn't disconnected longer than the TCP hard time out.

I was actually bored and implemented KA9Q style kick. Just apply this
patch to 2.2.13 (should apply to most 2.2,2.3 kernels too) and the
ssh patch to ssh-1.2.27. The kernel patch adds a new setsockopt
TCP_KICK that causes an immediate retransmit, without messing with
the timeouts or the congestion window. When the other end answers
TCP will go into slow start normally and recover.

The ssh patch adds a new escape ~k that calls TCP_KICK on the socket.

It would be also useful if that could be forced for all sockets
that use a particular device (e.g. triggered by pppd on reconnect).
That also wouldn't be very hard to implement, it is a variation of
the existing IFF_DYNAMIC patch, but not tonight.

Problem is that it circumvents the TCP congestion avoidance algorithms,
so a rogue application calling TCP_KICK in a tight loop could flood the
network (but it could do that anyways with a UDP socket, so it isn't
that bad)

The kernel patch is very simple, because fast path mtu discovery already
needed a way to force retransmits.

Only lightly tested, but seems to work here. Maybe it is helpful
for someone.

--- linux-2.2.13/include/linux/socket.h-NOKICK Mon Nov 15 01:33:43 1999
+++ linux-2.2.13/include/linux/socket.h Mon Nov 15 01:35:54 1999
@@ -243,6 +243,7 @@
#define TCP_NODELAY 1
#define TCP_MAXSEG 2
#define TCP_CORK 3 /* Linux specific (for use with sendfile) */
+#define TCP_KICK 4 /* KA9Q style kick -- force a retransmit */

#ifdef __KERNEL__
extern int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
--- linux-2.2.13/net/ipv4/tcp.c-NOKICK Mon Nov 15 01:29:37 1999
+++ linux-2.2.13/net/ipv4/tcp.c Mon Nov 15 02:35:46 1999
@@ -1744,6 +1744,12 @@
}
return 0;

+ case TCP_KICK:
+ lock_sock(sk);
+ tcp_simple_retransmit(sk);
+ release_sock(sk);
+ return 0;
+
default:
return -ENOPROTOOPT;
};

ssh-1.2.27 patch (a bit hackish and the setsockopt value hardcoded
because I wasn't feeling like fighting with glibc too):

--- ssh-1.2.27/clientloop.c-o Mon Nov 15 02:11:27 1999
+++ ssh-1.2.27/clientloop.c Mon Nov 15 02:34:27 1999
@@ -729,6 +729,7 @@
~# - list forwarded connections\r\n\
~& - background ssh (when waiting for connections to terminate)\r\n\
~? - this message\r\n\
+~k - kick connection\r\n\
~~ - send the escape character by typing it twice\r\n\
(Note that escapes are only recognized immediately after newline.)\r\n",
escape_char);
@@ -742,6 +743,20 @@
buffer_append(&stderr_buffer, s, strlen(s));
xfree(s);
continue;
+
+ case 'k':
+ /* glibc as usual sabotates kernel invention,
+ so TCP_KICK is hardcoded here. Should be
+ tested for by autoconf */
+#ifdef __linux__
+ {
+ int flag = 1;
+ setsockopt(connection_out, SOL_TCP, 4, &flag,
+ sizeof flag);
+ }
+#endif
+ continue;
+

default:
if (ch != escape_char)

-Andi

-- 
This is like TV. I don't like TV.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/