Linux 2.2.x outgoing TCP connects hang after several hours of use

Dan Hopper (eusdbho@rtp.ericsson.se)
Wed, 4 Aug 1999 14:25:42 -0400


Howdy,

I've been noticing a TCP connect problem on a masquerading gateway
box for a few months now, first running 2.2.7, and more recently
2.2.10 and 2.2.11pre2. They behavior is basically that outgoing TCP
connections originating on the gateway box itself fail, but that
incoming TCP connects succeed. Outgoing TCP connects that are
masqueraded via this box succeed as well. The interfaces are eth0
for the local private net and ppp0 for everything else.

Note that the setup works fine for many hours, but eventually after
quite a few packets have traversed the network, outgoing TCP
connects suddenly start to fail.

For example, an outgoing connection originating from another box
that the gateway masquerades, to port 80 on www.cdrom.com. Here's
tcpdump on ppp0 (so the IP and such have already been swapped):

from dhopper via ku4nf masq (works):
23:49:33.980136 209.170.132.154.61559 > 209.155.82.19.www: S 27472677:27472677(0) win 32120 <mss 1460,sackOK,timestamp 2153675 0,nop,wscale 0> (DF)
23:49:34.222955 209.155.82.19.www > 209.170.132.154.61559: S 1225849590:1225849590(0) ack 27472678 win 17520 <mss 1460> (DF)
23:49:34.223931 209.170.132.154.61559 > 209.155.82.19.www: . ack 1 win 32120 (DF)
....etc....

Here's a connect that originated on the gateway box, which hangs.

from ku4nf direct (hangs):
23:50:21.763813 209.170.132.154.2320 > 209.155.82.19.www: S 79001580:79001580(0) win 32120 <mss 1460,sackOK,timestamp 2158320 0,nop,wscale 0> (DF)
23:50:22.038500 209.155.82.19.www > 209.170.132.154.2320: S 1237843869:1237843869(0) ack 79001581 win 17520 <mss 1460> (DF)
23:50:22.039492 209.170.132.154.2320 > 209.155.82.19.www: R 79001581:79001581(0) win 0

And here's what netstat reported at that time:
tcp 0 1 ppp-1-154.dialup.r:2325 web1.cdrom.com:www SYN_SENT

Here's a TCP connect from a box on the internal net to the gateway
that's successful.

from dhopper to ku4nf via eth0 (works):
23:58:00.084632 dhopper.2280 > ku4nf.telnet: S 567299088:567299088(0) win 32120 <mss 1460,sackOK,timestamp 2204285[|tcp]> (DF)
23:58:00.086249 ku4nf.telnet > dhopper.2280: S 557024313:557024313(0) ack 567299089 win 32120 <mss 1460,sackOK,timestamp 2204145[|tcp]> (DF)
23:58:00.086366 dhopper.2280 > ku4nf.telnet: . ack 1 win 32120 <nop,nop,timestamp 2204285 2204145> (DF)
23:58:00.092997 dhopper.2280 > ku4nf.telnet: P 1:28(27) ack 1 win 32120 <nop,nop,timestamp 2204286 2204145> (DF)
....etc....

And here's the opposite direction - the gateway trying to connect to
the internal machine.

from ku4nf to dhopper via eth0 (hangs):
23:59:31.004069 ku4nf.2324 > dhopper.telnet: S 661532546:661532546(0) win 32120 <mss 1460,sackOK,timestamp 2213236[|tcp]> (DF)
23:59:31.004194 dhopper.telnet > ku4nf.2324: S 647783656:647783656(0) ack 661532547 win 32120 <mss 1460,sackOK,timestamp 2213377[|tcp]> (DF)
23:59:31.005078 ku4nf > dhopper: icmp: redirect dhopper to host dhopper [tos 0xc0]
23:59:31.005174 dhopper.telnet > dhopper.2324: S 647783656:647783656(0) ack 661532547 win 32120 <mss 1460,sackOK,timestamp 2213377[|tcp]> (DF)
23:59:34.498085 dhopper.telnet > ku4nf.2324: S 647783656:647783656(0) ack 661532547 win 32120 <mss 1460,sackOK,timestamp 2213727[|tcp]> (DF)
23:59:34.498905 ku4nf > dhopper: icmp: redirect dhopper to host dhopper [tos 0xc0]
23:59:34.499003 dhopper.telnet > dhopper.2324: S 647783656:647783656(0) ack 661532547 win 32120 <mss 1460,sackOK,timestamp 2213727[|tcp]> (DF)

And netstat at the time of the hang:
tcp 0 1 ku4nf:2326 dhopper:telnet SYN_SENT

And the routing table when problem happens for reference (looks fine):
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
198.79.53.27 * 255.255.255.255 UH 0 0 0 ppp0
192.168.40.10 * 255.255.255.255 UH 0 0 0 eth0
192.168.40.0 * 255.255.255.0 U 0 0 0 eth0
127.0.0.0 * 255.0.0.0 U 0 0 0 lo
default 198.79.53.27 0.0.0.0 UG 0 0 0 ppp0

A couple notes:
ku4nf is the masquerading gateway, running Linux
2.2.7/2.2.10/2.2.11pre2, at 192.168.40.10. dhopper is an internal
box on eth0, running Linux 2.2.10, at 192.168.40.3.

The kernel is compiled with IP masquerading, autofw, etc. I've
removed the firewalling rules, changed the default policies to
accept, and several other combinations, none of which fix the
problem when it occurs. I've tried many different TCP destination
ports to no avail.

So the question is, why is the gateway sending out a RST in response
to the remote system's SYN in the ppp0 case, and in the local
network (eth0) case, why is the gateway sending out seemingly bogus
ICMP redirects?

And I want to emphasis again that the behavior is the same even if
all firewalling rules are flushed and the default policies all
changed to accept, and that this problem does not begin to manifest
itself until several hours after the machine is in use. The only
way (it appears) to remove the problem is to actually reboot the
machine.

Thanks in advance!
Dan Hopper

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/