Re: 2.0.13 Sockets Stuck on close

Eric Schenk (schenk@cs.toronto.edu)
Wed, 21 Aug 1996 13:32:45 -0400


Christoph Lameter <clameter@miriam.fuller.edu> writes:
>I have right now two apache httpd's stuck on port 80 with the Socket
>in condition close since a few hours:
>
>aaron:~# netstat -t
>Active Internet connections (w/o servers)
>Proto Recv-Q Send-Q Local Address Foreign Address State
>tcp 0 0 aaron.fulle:netbios-ssn pac5.fuller.edu:1027 ESTABLISHED
>tcp 0 0 aaron.fulle:netbios-ssn digi5.fuller.edu:1057ESTABLISHED
>tcp 1 0 aaron.fuller.edu:www wwwproxy1.ac.il:27656 CLOSE
>tcp 0 0 aaron.fuller.edu:www comada.lnd.com:1139 CLOSE
>tcp 484 0 aaron.fuller.edu:www gw1.csfb.com:3217ESTABLISHED
>tcp 438 0 aaron.fuller.edu:telnet vax.fuller.edu:3188 CLOSE
>tcp 0 126 aaron.fuller.edu:telnet hur_s0.fuller.edu:1468 ESTABLISHED
>
>Does anyone know how to resolve these problems?

Not yet, I still haven't got enough information to figure it out,
and I can't reproduce it yet. If you can come up with a formula
for me to reproduce this, then maybe I can track it down a little faster.
Also, if I can get to the point where I can make a guess at what is happening
I might be able to give you some code to instrument the kernel and
try help track down the problem from your end.

>I have had the same with rlogin until I got a hacked version from Miquel
>that tries alternate source ports when connecting.
>
>I get rather frustrated with Linux Networking.
>
>There are three major problems with the 2.0.X Network stuff:
>
>1. Sockets Hang in Close

See above, it's on the list of things to fix, but I don't have
much to go on. So far I only have the two reports from you,
and a related report from another source to go on.
The related report may or may not be the same problem.
If ANYONE else is seeing this problem, please get in touch with me!
More importantly, if you can reproduce this problem, please
let me know how so I can gtrack it down faster.

>2. TCP sessions stall on busy machines.

I have no outstanding reports of this problem that cannot be
attributed to MTU mismatches on the endpoints of a point-to-point
link. However, I may have easily missed a report as I've been quite
busy with real work recently. Please forward me any detail you have
about this. tcpdump's of actual stalls are particularly useful.
Also, when you say "stall" do you mean "freezes, never to recover",
or do you just mean "gets really really slow"?

Also, slow network connections to the outside world are not news
unless you can exhibit a faster connection with different software
in the same environment. The internet at large is suffering from
increadible congestion these days. [This is not directed
at Christopher, but rather at the rest of the mailing list.
Please don't bombard me with reports that your netscape
connections are crawling unless you can substantiate that
it is due to a problem the Linux TCP code. Netscape
connections crawl on every kind of hardware/software these days.]

>3. Signal delivery is still unreliable. I sometimes get
> pppd's, menu programs stuck just polling for input. If I send
> them a HUP signal they gladly go away.

What does this have to do with networking? (Assuming that
signal deliver really is the problem.) If signal delivery isn't
the problem, then what is?
[If you are only seeing the problem with pppd, then what version
are you using? Previous to 2.2.0f there where some problems that
could have caused it to miss a hangup on the modem line.
As far as I know this is fixed in 2.2.0f and the 2.0.x kernels.
In any case, the issue there was not a signal problem, but a
problem with select(). pppd hangs up when select() returns
an error code.]

-- eric

---------------------------------------------------------------------------
Eric Schenk www: http://www.cs.toronto.edu/~schenk
Department of Computer Science email: schenk@cs.toronto.edu
University of Toronto