Re: Linux 2.2.x outgoing TCP connects hang after several hours of

George (greerga@nidhogg.ham.muohio.edu)
Wed, 4 Aug 1999 15:52:49 -0400 (EDT)


On Wed, 4 Aug 1999, Jordan Mendelson wrote:

>Dan Hopper wrote:
>>
>> Howdy,
>>
>> I've been noticing a TCP connect problem on a masquerading gateway
>> box for a few months now, first running 2.2.7, and more recently
>> 2.2.10 and 2.2.11pre2. They behavior is basically that outgoing TCP
>> connections originating on the gateway box itself fail, but that
>> incoming TCP connects succeed. Outgoing TCP connects that are
>> masqueraded via this box succeed as well. The interfaces are eth0
>> for the local private net and ppp0 for everything else.

I've been having similar problems now.

Symptom:

TCP connection hangs in outgoing data. Currently I have a connect stalled
with 560 in Send-Q (no timer) that refuses to send a packet. Ooh, it
finally sent the queue.

16:25:31.411687 208.223.154.199.1022 > 208.223.154.2.ssh:
P 3417368335:3417368895(560) ack 3861510271 win 31856
<nop,nop,timestamp 19541626 118640298> (DF) [tos 0x10]
16:25:31.791692 208.223.154.2.ssh > 208.223.154.199.1022:
. ack 560 win 32120
<nop,nop,timestamp 118652383 19541626> (DF) [tos 0x10]
16:25:31.851656 208.223.154.2.ssh > 208.223.154.199.1022:
P 1:85(84) ack 560 win 32120
<nop,nop,timestamp 118652384 19541626> (DF) [tos 0x10]
16:25:31.871679 208.223.154.199.1022 > 208.223.154.2.ssh:
. ack 85 win 31856
<nop,nop,timestamp 19541672 118652384> (DF) [tos 0x10]

Then I go bang on the keyboard a bit. No tcpdump traffic over the modem to
the host 2 hops away:

traceroute to galadriel.rm1.net (208.223.154.2), 30 hops max, 38 byte packets
1 legolas.rm1.net (208.223.154.11) 190.849 ms 157.066 ms 141.982 ms
2 galadriel.rm1.net (208.223.154.2) 148.661 ms 146.670 ms 139.321 ms

(as I type this e-mail on a host 14 hops away without any lag at all).

Here's the traffic that finally resulted from my banging on keyboard
(overlap with above for context):

16:25:31.791692 208.223.154.2.ssh > 208.223.154.199.1022: . ack 560 win 32120 <nop,nop,timestamp 118652383 19541626> (DF) [tos 0x10]16:25:31.851656 208.223.154.2.ssh > 208.223.154.199.1022: P 1:85(84) ack 560 win 32120 <nop,nop,timestamp 118652384 19541626> (DF) [tos 0x10]
16:25:31.871679 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 85 win 31856 <nop,nop,timestamp 19541672 118652384> (DF) [tos 0x10]
16:27:31.791687 208.223.154.199.1022 > 208.223.154.2.ssh: P 560:2008(1448) ack 85 win 31856 <nop,nop,timestamp 19553664 118652384> (DF) [tos 0x10]
16:27:32.991654 208.223.154.2.ssh > 208.223.154.199.1022: . 85:1533(1448) ack 2008 win 32120 <nop,nop,timestamp 118664452 19553664> (DF) [tos 0x10]
16:27:33.441702 208.223.154.2.ssh > 208.223.154.199.1022: . 1533:2981(1448) ack 2008 win 32120 <nop,nop,timestamp 118664452 19553664> (DF) [tos 0x10]
16:27:33.442249 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 2981 win 31856 <nop,nop,timestamp 19553829 118664452> (DF) [tos 0x10]
16:27:33.841673 208.223.154.2.ssh > 208.223.154.199.1022: P 2981:4341(1360) ack 2008 win 32120 <nop,nop,timestamp 118664452 19553664> (DF) [tos 0x10]
16:27:34.211647 208.223.154.2.ssh > 208.223.154.199.1022: P 4341:4849(508) ack 2008 win 32120 <nop,nop,timestamp 118664561 19553829> (DF) [tos 0x10]
16:27:34.231666 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4849 win 31856 <nop,nop,timestamp 19553908 118664561> (DF) [tos 0x10]
16:27:35.511671 208.223.154.199.1022 > 208.223.154.2.ssh: P 2008:3140(1132) ack 4849 win 31856 <nop,nop,timestamp 19554036 118664561> (DF) [tos 0x10]
16:27:36.181674 208.223.154.2.ssh > 208.223.154.199.1022: P 4849:4925(76) ack 3140 win 32120 <nop,nop,timestamp 118664817 19554036> (DF) [tos 0x10]
16:27:36.201673 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19554105 118664817> (DF) [tos 0x10]
16:27:39.751517 208.223.154.199.55757 > 208.223.154.2.33435: udp 10 [ttl 1]
16:27:39.944969 208.223.154.199.55757 > 208.223.154.2.33436: udp 10 [ttl 1]
16:27:40.102337 208.223.154.199.55757 > 208.223.154.2.33437: udp 10 [ttl 1]
16:27:40.244876 208.223.154.199.55757 > 208.223.154.2.33438: udp 10
16:27:40.391672 208.223.154.2 > 208.223.154.199: icmp: 208.223.154.2 udp port 33438 unreachable [tos 0xc0]
16:27:40.396220 208.223.154.199.55757 > 208.223.154.2.33439: udp 10
16:27:40.541707 208.223.154.2 > 208.223.154.199: icmp: 208.223.154.2 udp port 33439 unreachable [tos 0xc0]
16:27:40.543080 208.223.154.199.55757 > 208.223.154.2.33440: udp 10
16:27:40.681736 208.223.154.2 > 208.223.154.199: icmp: 208.223.154.2 udp port 33440 unreachable [tos 0xc0]
16:27:41.221688 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19554607 118664817> (DF) [tos 0x10]
16:27:41.361729 208.223.154.2.ssh > 208.223.154.199.1022: . ack 3140 win 32120 <nop,nop,timestamp 118665339 19554607> (DF) [tos 0x10]
16:27:50.441706 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19555529 118665339> (DF) [tos 0x10]
16:27:50.581669 208.223.154.2.ssh > 208.223.154.199.1022: . ack 3140 win 32120 <nop,nop,timestamp 118666260 19555529> (DF) [tos 0x10]
16:28:08.741706 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19557359 118666260> (DF) [tos 0x10]
16:28:08.891699 208.223.154.2.ssh > 208.223.154.199.1022: . ack 3140 win 32120 <nop,nop,timestamp 118668091 19557359> (DF) [tos 0x10]
16:28:45.211696 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19561006 118668091> (DF) [tos 0x10]
16:28:45.351669 208.223.154.2.ssh > 208.223.154.199.1022: . ack 3140 win 32120 <nop,nop,timestamp 118671737 19561006> (DF) [tos 0x10]
16:29:57.991767 208.223.154.199.1022 > 208.223.154.2.ssh: . ack 4925 win 31856 <nop,nop,timestamp 19568284 118671737> (DF) [tos 0x10]
16:29:58.151665 208.223.154.2.ssh > 208.223.154.199.1022: . ack 3140 win 32120 <nop,nop,timestamp 118679017 19568284> (DF) [tos 0x10]

There's a traceroute in there and SSH keepalives too.

>Are you sure it's not modem failure? Double-check to see what your modems
>upload/download rate is after several hours, a large portion of modems will
>retrain down their outgoing or incoming connection rates and have bugs in their
>firmware which cause connection rates to spiral down. Try limiting your
>connection rate to 1200/2400 baud less than what you normally connect at...

Two connections to same machine, one dead, one fine. Definitely not my
modem.

Remote: RedHat 6.0 -- 2.2.5-22 kernel.
Mine: RedHat 6.0 -- 2.2.10 kernel.

Only 1 router in between my modem and the other machine.

I have IP MASQ enabled and a rule for it loaded but haven't had my other
machine on to actually use it. It takes a few hours for the connections to
go dead. Logging out and back in fixes it.

-George Greer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/