Re: [PATCH 2.6.25.7] af_unix: fix 'poll for write'/ connectedDGRAM sockets

From: David Miller
Date: Wed Jun 18 2008 - 00:56:39 EST


From: Rainer Weikusat <rweikusat@xxxxxxxxxxx>
Date: Tue, 17 Jun 2008 20:47:02 +0200

> The unix_dgram_sendmsg routine implements a (somewhat crude)
> form of receiver-imposed flow control by comparing the length of the
> receive queue of the 'peer socket' with the max_ack_backlog value
> stored in the corresponding sock structure, either blocking
> the thread which caused the send-routine to be called or returning
> EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
> sockets. The poll-implementation for these socket types is
> datagram_poll from core/datagram.c. A socket is deemed to be writeable
> by this routine when the memory presently consumed by datagrams
> owned by it is less than the configured socket send buffer size. This
> is always wrong for connected PF_UNIX non-stream sockets when the
> abovementioned receive queue is currently considered to be full.
> 'poll' will then return, indicating that the socket is writeable, but
> a subsequent write result in EAGAIN, effectively causing an
> (usual) application to 'poll for writeability by repeated send request
> with O_NONBLOCK set' until it has consumed its time quantum.
>
> The change below uses a suitably modified variant of the datagram_poll
> routines for both type of PF_UNIX sockets, which tests if the
> recv-queue of the peer a socket is connected to is presently
> considered to be 'full' as part of the 'is this socket
> writeable'-checking code. The socket being polled is additionally
> put onto the peer_wait wait queue associated with its peer, because the
> unix_dgram_sendmsg routine does a wake up on this queue after a
> datagram was received and the 'other wakeup call' is done implicitly
> as part of skb destruction, meaning, a process blocked in poll
> because of a full peer receive queue could otherwise sleep forever
> if no datagram owned by its socket was already sitting on this queue.
> Among this change is a small (inline) helper routine named
> 'unix_recvq_full', which consolidates the actual testing code (in three
> different places) into a single location.
>
> Signed-off-by: <rweikusat@xxxxxxxxxxx>

Thank you for fixing this bug.

I'm going to review the logic in the new poll routing a little
bit more, then apply it to net-2.6 unless I find some problems.

Thanks again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/