Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

From: Ilpo Järvinen
Date: Sat May 31 2008 - 06:05:44 EST


On Sat, 31 May 2008, Evgeniy Polyakov wrote:

> On Sat, May 31, 2008 at 12:11:32AM +0300, Ilpo Järvinen (ilpo.jarvinen@xxxxxxxxxxx) wrote:
> > > > > Active Internet connections (w/o servers)
> > > > > Proto Recv-Q Send-Q Local Address Foreign Address
> > > > > State
> > > > > tcp 0 174592 10.0.1.14:58015 10.0.1.14:3632 ESTABLISHED
> > > > > tcp 72134 0 10.0.1.14:3632 10.0.1.14:58015 ESTABLISHED
> >
> > ^^^^^
> >
> > Can you perhaps find/guess/think some explanation for this _receiver
> > queue_...? This was a trick question :-), as we already know that the
> > receiving process is no longer there and therefore obviously won't be
> > reading anything anymore. But that opened another question, why TCP is
> > then still in ESTABLISHED as orphaned TCP shouldn't be in establised state
> > anymore, tcp_close should have changed the state (either at close or at
> > process exit). I guess once it becomes known why tcp_close either wasn't
> > called at all or it didn't change the state of the flow (it's quite
> > simple, see for yourself), the cause of the bug is found (it might even be
> > that the process went away when it shouldn't have, either a bookkeeping
> > bug somewhere or real death, or something along those lines).
>
> In that case pid Ingo straced was still valid and existed (ans stuck
> probably in tcp_sendpage() - sysrq-t dump was messy),

...No, I think it was the sender, not the receiver that Ingo straced!
A _correctly_ behaving TCP sender will just get "stuck" once the window
is exhausted. It as was proven that advertized window is zero (tcpdump).
So I find that hardly a bug.

> so process still exist, but sock somewhat lost its reference to socket.

...this is something that seems possible explanation.

> AFICS all (three)
> calls for sk_orphan() can not happen without state change to something
> closing (like fin wait or close), except some linger things and
> listening socket, which frees its connection requests.

Yes, this what I figured out as well (distcc didn't seem to play with
linger, I checked, nor would it have any reason to)... The LISTENING
socket was still there in /proc/net/tcp, so closing that won't explain it.

In addition, it seems that tcp_close would notify the other end as well,
either with RST or with FIN, which would cause the other end to leave
ESTABLISHED (unless tcp_transmit_skb would fail but I guess that would end
up into some MIB in either of the cases). Here RST is the way to go
because there's unread data (though in theory having yet another bug could
allow building up the receiver queue afterwards).

Which then boils down to two alternatives:
1) The receiving process lost its reference though the process
still exists and therefore cannot read
2) tcp_close wasn't called when it should have, e.g., process exited
but the tcp_close wasn't called because the reference wasn't there

Btw, Ingo also got this warning later on which might be helpful in
understanding all this:

>>>>> after ~7 hours of uptime the networking code produced this assertion:
>>>>>
>>>>> [25441.140000] KERNEL: assertion (!sk->sk_ack_backlog) failed at
>>>>> net/ipv4/inet_connection_sock.c (642)

I guess that sk_ack_backlog was already visible at the time when the TCP
got stuck because listening socket already had 1 there in /proc/net/tcp
(though this depends if there was some other connection attempts still
going on).


...Some new info (at least to me). I tried with ./stress_accept 5000 2
(attached), I could easily produce a number of ESTABLISHED connections
which won't yet have an owner in netstat (tested in 2.6.25.3) while
connect.sh is running but it seems some kind of transitional state as once
I do killall -TERM connect.sh (attached), they get the owning
stress_accept. So third explanation would be some race during that
transition that would allow process to never get the reference...

Before:
tcp 65436 0 lo:5000 lo:59980 ESTABLISHED -
After killall:
tcp 76444 0 lo:5000 lo:59980 ESTABLISHED 9013/stress_accept


--
i.#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <sys/wait.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>


void error(const char *str)
{
perror(str);
exit(1);
}

void worker(int listenfd)
{
int fd;
struct sockaddr_in peeraddr;
socklen_t len = sizeof(peeraddr);
int pid;
char buf[100];

while(1) {
memset(&peeraddr, 0, sizeof(peeraddr));
fd = accept(listenfd, (struct sockaddr *)&peeraddr, &len);
if (fd < 0)
error("accept");

pid = fork();
if (pid < 0) {
error("fork");
} else if (pid != 0) {
close(fd);
/* while(1)
sleep(1); */
} else {
read(fd, &buf, sizeof(buf));
while(1)
sleep(1);
exit(0);
}
}

exit(0);
}

int main(int argc, char **argv)
{
int listenfd;
struct sockaddr_in myaddr;
int res;
pid_t pid;
int i;

if (argc < 3) {
fprintf(stderr, "%s serverport processes\n", argv[0]);
exit(1);
}

listenfd = socket(PF_INET, SOCK_STREAM, 0);
if (listenfd < 0)
error("socket");

memset(&myaddr, 0, sizeof(myaddr));
myaddr.sin_family = PF_INET;
myaddr.sin_addr.s_addr = inet_addr("127.0.0.1");
myaddr.sin_port = htons(atoi(argv[1]));

res = bind(listenfd, (struct sockaddr *)&myaddr, sizeof(myaddr));
if (res < 0)
error("bind");

/* distccd uses 10 here */
if (listen(listenfd, 10) < 0)
error("listen");

for (i = 0; i < atoi(argv[2]); i++) {
pid = fork();
if (pid < 0)
error("fork");

if (pid == 0) {
worker(listenfd);
}
}

while (1) {
sleep(1);
}

return 0;
}

Attachment: connect.sh
Description: Bourne shell script