Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]

From: Jann Horn
Date: Tue Mar 03 2020 - 09:13:57 EST


On Tue, Mar 3, 2020 at 3:10 PM Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 03, 2020 at 02:43:16PM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Mar 03, 2020 at 02:34:42PM +0100, Miklos Szeredi wrote:
> > > On Tue, Mar 3, 2020 at 2:14 PM Greg Kroah-Hartman
> > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > > Unlimited beers for a 21-line kernel patch? Sign me up!
> > > > >
> > > > > Totally untested, barely compiled patch below.
> > > >
> > > > Ok, that didn't even build, let me try this for real now...
> > >
> > > Some comments on the interface:
> >
> > Ok, hey, let's do this proper :)
>
> Alright, how about this patch.
>
> Actually tested with some simple sysfs files.
>
> If people don't strongly object, I'll add "real" tests to it, hook it up
> to all arches, write a manpage, and all the fun fluff a new syscall
> deserves and submit it "for real".

Just FYI, io_uring is moving towards the same kind of thing... IIRC
you can already use it to batch a bunch of open() calls, then batch a
bunch of read() calls on all the new fds and close them at the same
time. And I think they're planning to add support for doing
open()+read()+close() all in one go, too, except that it's a bit
complicated because passing forward the file descriptor in a generic
way is a bit complicated.

> It feels like I'm doing something wrong in that the actuall syscall
> logic is just so small. Maybe I'll benchmark this thing to see if it
> makes any real difference...
>
> thanks,
>
> greg k-h
>
> From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Subject: [PATCH] readfile: implement readfile syscall
>
> It's a tiny syscall, meant to allow a user to do a single "open this
> file, read into this buffer, and close the file" all in a single shot.
>
> Should be good for reading "tiny" files like sysfs, procfs, and other
> "small" files.
>
> There is no restarting the syscall, am trying to keep it simple. At
> least for now.
>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
[...]
> +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename,
> + char __user *, buffer, size_t, bufsize, int, flags)
> +{
> + int retval;
> + int fd;
> +
> + /* Mask off all O_ flags as we only want to read from the file */
> + flags &= ~(VALID_OPEN_FLAGS);
> + flags |= O_RDONLY | O_LARGEFILE;
> +
> + fd = do_sys_open(dfd, filename, flags, 0000);
> + if (fd <= 0)
> + return fd;
> +
> + retval = ksys_read(fd, buffer, bufsize);
> +
> + __close_fd(current->files, fd);
> +
> + return retval;
> +}

If you're gonna do something like that, wouldn't you want to also
elide the use of the file descriptor table completely? do_sys_open()
will have to do atomic operations in the fd table and stuff, which is
probably moderately bad in terms of cacheline bouncing if this is used
in a multithreaded context; and as a side effect, the fd would be
inherited by anyone who calls fork() concurrently. You'll probably
want to use APIs like do_filp_open() and filp_close(), or something
like that, instead.