RE: recv() hangs until SIGCHLD ?

From: David Schwartz
Date: Sat Oct 11 2008 - 00:54:33 EST



Nicolas Cannasse wrote:

> In some rare cases, one (or several) threads are hanging in recv().
> Both lsof and ls /proc/<pid>/fd show that the socket used is in
> ESTABLISHED mode but when checking on the host on which it's connected
> (a mysql DB) we can't find the corresponding client socket (as it's
> been closed already on the other side).

Blocking sockets will block until data is received. If no other thread is
sending data, this can block forever.

> We are using the Boehm GC which uses the signals SIGXCPU and SIGPWR to
> pause+restart the threads when running a GC cycle. We are correctly
> handling EINTR in send() and recv() by restarting the call in case
> they get interrupted this way.
>
> However, when attaching GDB to our locked thread it seems that even
> when the GC runs, recv() does not exit (the breakpoint after it is not
> reached). If we send SIGCHLD to the hanging thread with GDB, recv()
> does exit and the thread is correctly unlocked. If we don't, it will
> hang forever.

Why shouldn't it hang forever? What was supposed to wake it that's not?

> Any idea how we can stop this from happening or what additional things
> we can check to get more informations on what's occurring ?

You say a thread is hanging in receive and not returning. But you've yet to
explain why it should return. Was it interrupted by a signal? Was data
received? Is the socket non-blocking? Why isn't this expected behavior?
Blocking sockets block, full stop.

DS


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/