sendfile(2) idea (was: Thread implementations)

Matti Aarnio (matti.aarnio@sonera.fi)
Thu, 25 Jun 1998 15:25:55 +0300 (EEST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Amsden, Zachary: "RE: (reiserfs) Re: LVM / Filesystems / High availability"
Previous message: Niels Kristian Bech Jensen: "2.1.107 breaks ``setfont''."
In reply to: Vojtech Pavlik: "Re: uniform input device packets?"

David Luyer <luyer@ucs.uwa.edu.au> writes:
....
> How about...
> * ftp server sending out static objects
ok (binary-only, TEXT may or may not need processing)
> * proxy cache serving requests
ok - maybe
> * imap/pop with maildir mailbox format sending mail items
Wasn't here line-start dot-duplication just like
in SMTP DATA phase ? Thus this can't be used.
> * news server sending out articles to peers and clients
Same cavet as with SMTP DATA ?
> * sendmail sending mail from queue
Not -- SMTP DATA phase needs intelligent statefull processing

> That's just some for programs who want to put a file from the hard disk
> to the network. Maybe sendfile() could be used for other things too -
> such as direct disk-to-disk copy without leaving kernel space, backups,
> whatever. If it's fully generic and supports network-to-network then
> I can imagine a tunnel/firewall daemon run from inetd which does
>
> connect()
> sendfile()
> exit()

Yes, why not.

> (moving the main data-pump type loop into the kernel where it can be
> properly optimized)

I have uses where the main data-pump needs to be bidirectional,
and where closing one direction may, or may not mean instant
close of the other direction too. (shutdown() on socket)

> I don't know the specification for a sendfile() but lets say it was
>
> sendfile(fd-in, fd-out, options);
>
> fd-in - file descriptor to disk file (? is this needed as a restriction)
> fd-out - file descriptor to any object
copying-length - "-1" == until EOF ?
> options - eg, close on send completion, sync/async. etc.
bidirectionality, syscall exit conditions (eof on either fd?)

> Many, many applications could make use of a sendfile() function in some way
> (assuming it doesn't close the connection after sending the file, or that
> connection close is controlled by a flag).

Better to let the close to be separate syscall, as well as open()
or socket() et.al. which are used to form either fd.

> I think we need a clear definition of exactly what sendfile() is/does,
> but just about any implementation of it would have gains to various areas.
>
> > which is basically little more
> > than a benchmark. If you really have such a hugely loaded web server
> > you are likely to be doing lots of database lookups, cookie-controlled
> > variable content, shtml, other cgi trickery, etc.
>
> What about a heavily loaded squid proxy cache? Also, there's a hell of
> a lot of static web content out there you know. Images and the like,
> probably the bulk of web content by volume.

To make it simple to implement, you have it as a kernel
space copy of data from fd-in to fd-out until one of the
ending conditions is met. It shall return the number of
bytes copied, or -1 and errno.

Easiest possible way to do is to have unidirectional copying,
but applications needing high-performance bidirectional copying
would still need either to split into two, or be twin threads.
(On the other hand, why not, they are esoteric animals anyway..)

The main reasons for NOT doing sendfile() as:
sendfile(filename, startoffset, copylength, outfd, opts)
is IMO its lack of genericity -- purely send a filesystem
object out to any fd...

Frankly name of the syscall should not be sendfile(); perhaps:
fddatacopy()

... and it all will need support for in-kernel data movement
in between different files. (All possible subsystems have
their own ways to do read() and write() in between user and
kernel spaces -- like writing to /dev/null is "accelerated"
by never actually copying anything from userspace to kernel.)

> > Would we just be doing this to look good agains NT in webstones?
>
> No, we would be doing it to perform better, and looking good in webstones
> comes as a free extra.

Indeed, but lets do it elegantly if at all possible!

> David.

/Matti Aarnio <matti.aarnio@sonera.fi>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

Next message: Amsden, Zachary: "RE: (reiserfs) Re: LVM / Filesystems / High availability"
Previous message: Niels Kristian Bech Jensen: "2.1.107 breaks ``setfont''."
In reply to: Vojtech Pavlik: "Re: uniform input device packets?"