Weird TCP CLOSE state behavior

Jason Gunthorpe (jgg@gpu.srv.ualberta.ca)
Tue, 3 Nov 1998 23:56:47 -0700 (MST)


[Please CC, I'm not on the list]

Hi,

I'm seeing some strange behavoir with some rsync processes. They somehow
manage to get their connection stuck in the CLOSE state and never die, I
now have some that have been sitting around for almost two days! I
discussed the matter with Andrew and there is some speculation that it may
be some weird kernel problem, here are the details..

Process 26556 has been running for nearly two days, lsof shows this
information:

rsync 26556 nobody 0u inet 0x02921c0c 0x1e5eade3 TCP
debian.novare.net:rsync->llug.sep.bnl.gov:25627 (CLOSE)

netstat -to shows,
tcp 0 27128 debian.novare.net:rsync llug.sep.bnl.gov:25627 CLOSE
on (2.19/16)

And if I attach a strace to the process I get this:

debian{root}/proc/net#strace -p 26556
select(1, [0], [0], NULL, {58, 460000}) = 0 (Timeout)
select(1, [0], [0], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [0], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [0], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [0], NULL, {60, 0}

Now, it seems to me that select should return that the socket is ready for
reading. I perused the code (very!) quickly and it looks to me that when
the socket enters the TCP_CLOSE state the sk->shutdown item is set to at
least RCV_SHUTDOWN and when the select function sees this it will return
1..

However this does not seem to be the case, rsync will loop indefinately
waiting for select to tell it to read [timeout is disabled, another issue]

So, I figure that either this is a kernel glitch and select should return
something when the socket is in the CLOSE state or rsync is not properly
detecting this state through some other magical means.

Can anyone shed some light on this? It is driving me and my mirrors batty
:> (these dead rsyncs takeup a slot of the connection limit)

This is a Debian 2.0 'hamm' system running 2.0.35 and glibc 2.0.7t, I have
had reports of this from some of the mirrors but I do not know what kernel
they are using. A packet trace is not terribly practicle I fear, we do
about a gig of rsync traffic per day. I did run 'tcpdump host
llug.sep.bnl.gov' for about 20 mins and did not see one packet.

Thanks,
Jason

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/