Re: Thread implementations...

Linus Torvalds (torvalds@transmeta.com)
Fri, 26 Jun 1998 18:16:59 -0700 (PDT)


On Fri, 26 Jun 1998, Adam D. Bradley wrote:
>
> I think the send() idea Alan and Linus tossed around is the cleanest
> way to go, in combination with:
> setsockopt(fd, SOL_SOCKET, SO_CONSTIPATED)
> ;-)

I'd like to still get David's feedback on this (and I definitely would
prefer a different name for the socket option ;), but I really think it's
the right way to go.

I just made a pre-2.1.108 and put it on ftp.kernel.org - it fixes a
problem where my sendfile() forgot to get the kernel lock (blush), so it
randomly didn't work correctly on SMP.

I've also done some more testing of sendfile(), and the nice thing is that
when I compared doing a file copy with sendfile compared to a plain "cp",
the sendfile implementation was about twice as fast (at least my version
of "cp" will just do read+write pairs over and over again). When I copied
a 38MB file the "cp" took 1:58 seconds while sendfile took 1:08 seconds
according to "time" (I have 512MB of RAM, so this was all cached,
obviously)..

I haven't done any network tests, because I don't think I'd be able to see
any difference, and it does need the "SO_CONSTIPATED" thing and a way to
push the end of data for best performance.

Some final words on sendfile():

- it does report errors correctly. That doesn't mean that you necessarily
can know _which_ fd produced the error, that you have to find out on
your own. A file real access can generally result in EIO and EACCES
(the latter with NFS and other "protection-at-read-time" non-UNIX
filesystems), while the output write() can result in a number of errors
as the output fd can be any kind of socket/tty/file. Depending on the
mode of the output file, the output errors can include EINTR, EAGAIN
etc, and you can mix sendfile() with select() on the output socket, for
example.

- you can give it a length of MAX_ULONG, and it will write as much as it
can. This is entirely consistent with the notion that it is equivalent
with write(out, tmpbuf, read(in, tmpbuf, size)) where "tmpbuf" is
essentially infinite - the read() will read al of the file and return
the file length in the process. Thus you don't even need to know the
size of the file beforehand.

The file copy test was essentially done with a single

error = sendfile(out, in, ~0);

and I'm appending my current test-program.

This is going to be in 2.2, btw. The changes are so small and so obviously
have to work that it would be ridiculous not to have this - the only
question is whether I'll try to make it a "copyfd()" system call instead,
falling back on read+write when I can't use the page cache directly. I
suspect I won't.

Linus

-----
/*
* Very stupid example of using the sendfile()
* system call.
*/

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <sys/fcntl.h>

ssize_t sendfile(int out, int in, size_t size)
{
ssize_t retval;

asm volatile(
"pushl %%ebx\n\t"
"movl %%esi,%%ebx\n\t"
"int $0x80\n\t"
"popl %%ebx"
:"=a" (retval)
:"0" (187),
"S" (out), /* pseudo-ebx */
"c" (in),
"d" (size));
if ((unsigned long) retval > (unsigned long)-1000) {
errno = -retval;
retval = -1;
}
return retval;
}

int main(int argc, char **argv)
{
int in, out, error;

in = open(argv[1], O_RDONLY);
if (in < 0) {
perror("open input");
exit(1);
}
out = open(argv[2], O_WRONLY | O_CREAT, 0666);
if (out < 0) {
perror("open output");
exit(1);
}
error = sendfile(out, in, ~0);
printf("sendfile returned %d\n", error);
if (error < 0) {
perror("sendfile");
}
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu