RE: UDP recvmsg blocks after select(), 2.6 bug?

From: Robert White
Date: Mon Oct 18 2004 - 17:26:44 EST


Sorry, I was thinking in the generic case of "a protocol read() after select()" and
not specifically a UDP read() after select(); as any semantic chosen needs to be
generic. That may be out of scope for your considerations.

If you allow for zero-length packets on the transport, then you are introducing a
semantic entity to silently displace a procedural entity [e.g. if your messaging
scheme allows for zero-length packets and you have the kernel "fake" a zero-length
packet, then the kernel is triggering semantics instead of and error recognition
process; if zero-length packets are impossible, then the kernel is creating a "new"
return condition with unforeseen consequences in all existing code.]

If your process is *not* prepared to deal with a zero-length return from a receive
message, then you will get a semantic error. [e.g. you "know" that all the packets
you receive have a certain structure, but here you are "receiving" a non-error even
that is outside of your semantic set and so not allowed-for in your existing state.
Etc.]

For every other file handle zero-length read is end of file. So there is this "well
established" semantic meaning for "if ( 0 == read(fd,...))" and you are proposing to
non-trivially create a one-off for the specific case of fd==UDP-socket. So now if
you pass this file handle through a generic mechanism then you break the generic
semantics by creating a "different class of files" where a state problem leads to the
generation of an "in-band, originless, valid receive event" that is completely
dissimilar to the expected meaning of the return value from a standard function call.

Basically, if it is possible to send and receive a zero-length message in a
connectionless protocol, you are _stealing_ the possible semantic meaning of that
message and retroactively claiming it as a signal from the kernel to the program. IF
it isn't already possible to send and receive a zero-length message in that
connectionless transport, then you are adding a semantic that all the existing code
may be completely unable to interpret, or which may "trick" applications into
deciding they are getting the end-of-file condition because they don't know or care
that the transport in question is UDP.

So if you have a generic handling mechanism, centered on select(), that "knows" that
if it sees 0==read(...) then it should close the file handle, and if that mechanism
is given sockets conforming to this proposed modification, then that mechanism will
break.

[I *am* out of my depth about whether UDP allows zero-length messages, it has never
come up for me, but I don't think it matters. If it isn't UDP legal, then you are
adding semantic. If it is legal, then you are overloading known semantic. Both
actions are surprising, so both are wrong.]

So returning zero from a read function on a file descriptor that "can not"
meaningfully know end-of-file (because it doesn't FIN etc) is still a very bad idea
because of the odd-out cases where it will have "impossible" or at least wildly
incorrect semantic consequences.

Not a good space to be mucking around in.

But that's just my opinion. And I am now rambling... 8-)

Rob White.


-----Original Message-----
From: Willy Tarreau [mailto:willy@xxxxxxxxx]
Sent: Saturday, October 16, 2004 3:25 AM
To: Robert White
Cc: 'David S. Miller'; 'Olivier Galibert'; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: UDP recvmsg blocks after select(), 2.6 bug?

On Fri, Oct 15, 2004 at 03:42:55PM -0700, Robert White wrote:
> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx
[mailto:linux-kernel-owner@xxxxxxxxxxxxxxx]
> On Behalf Of Willy Tarreau
>
> > As I asked in a previous mail in this overly long thread, why not returning
> > zero bytes at all. It is perfectly valid to receive an UDP packet with 0
>
> Zero bytes is "end of file". Don't go trying to co-opt end of file. That way lies
> madness and despair.

Please explain me what "end of file" means with UDP. If your UDP-based app
expects to receive a zero when the other end stops transmitting, then it
might wait for a very long time. As opposed to TCP, there's no FIN control
flag to tell the remote host that you sent your last packet.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/