Re: Thread implementations...

Linus Torvalds (torvalds@transmeta.com)
Fri, 26 Jun 1998 09:45:16 -0700 (PDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Krzysztof G. Baranowski: "Re: Secure-linux and standard kernel"
Previous message: Linus Torvalds: "Re: Thread implementations..."
In reply to: Adam D. Bradley: "Re: (reiserfs) Re: LVM / Filesystems / High availability"

On Fri, 26 Jun 1998, Alan Cox wrote:
>
> > fairly fundamental to any high bandwidth). It needs to be the _default_
> > heuristic, because it works reasonably without any user intervention, but
> > there is nothing to say that it has to be the only choice.
>
> Indeed. The actual kernel code already understands the "hang onto a partial
> packet for a while" notion since it now becomes if(nagle || constipated_mode)
>
> I don't see how the API works though. When does it turn off, how do I turn
> it off in the middle of a send_file ?

I was just thinking something like

sk = socket(..);
bind(sk, ...);
listen(sk, ..);

/*
* Set the "DODELAY" socket option on the listen socket,
* so all accepted sockets will automatically get delayed
* values. This is the logical reverse of TCP_NODELAY.
*/
setsockopt(sk, IPPROTO_TCP, TCP_DODELAY, &1, sizeof(int));
for (;;) {
fd = accept(sk);
/*
* fd now has no nagle, we need to explicitly push if we
* want to send out a partial packet
*/
doconnection(fd);
}

and then when you handle the connection you do something like this:

write(fd, header, header_len);
sendfile(fd, file_to_send, size_of_file_to_send);
write(fd, NULL, 0);

where the zero-sized write just acts as a "push()" operation which forces
the currently outstanding packet to go on the out-queue (if you do any
other write() or sendfile() calls before the packet actually hits the
network, the normal packet grow feature would still kick in, so a "push()"
operation would _not_ result in extra packets on the network, it really
only guarantees that the other end sees the lowest possible latency for
the transfer).

With the above kinds of setup we get exactly the right behaviour: because
the user knows what the _real_ latency constraints are, it can essentially
tell the kernel with the push() operation that it's not the _first_ write
that is latency-critical (which is what normal nagle does), but it's the
last of the series that we want to push.

I don't see why you would like to turn it off in the middle. I see that
you want to push() at the end of a transfer when you keep your socket open
for the next transfer, but I don't see why you'd want to get a less than
full-sized packet in the middle of the transfer.

Looking at the code, this really shouldn't be all that hard to do - the
basic nagle already obviously requires all the hard things (coalescing
partial packet writes is the important part and we've done that since we
got the networking to work at all).

And it guarantees that we get exactly the right behaviour on the wire.
Btw, using writev() doesn't give you the right behaviour either - if you
use writev() and the thing you want to get out happens to be 2kB, I think
you currently get something like this on the wire with ethernet:

- full-sized 1400 byte packet: send out immediately
- 600 byte packet: less than 1/2 the MSS, so now we'll wait for the
other end to send the ACK before we send out the packet, because we
already have one packet in flight.

while with the above "new nonagle + push()" you get

- full-sized 1400 byte packet: sent out immediately
- 600-byte packet: push() forces it out on the network as soon as we can
physically write it and the window is open.

which is a lot better and gives much lower latency as seen by the other
end. I think this sounds like something web-people would like to have
regardless of whether they use sendfile() or not.

What do you think? Did I miss anything?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

Next message: Krzysztof G. Baranowski: "Re: Secure-linux and standard kernel"
Previous message: Linus Torvalds: "Re: Thread implementations..."
In reply to: Adam D. Bradley: "Re: (reiserfs) Re: LVM / Filesystems / High availability"