Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster

From: Jan Ziak
Date: Mon Jul 06 2020 - 02:08:31 EST


On Sun, Jul 5, 2020 at 1:58 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sun, Jul 05, 2020 at 06:09:03AM +0200, Jan Ziak wrote:
> > On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote:
> > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > You should probably take a look at io_uring. That has the level of
> > > > > complexity of this proposal and supports open/read/close along with many
> > > > > other opcodes.
> > > >
> > > > Then glibc can implement readfile using io_uring and there is no need
> > > > for a new single-file readfile syscall.
> > >
> > > It could, sure. But there's also a value in having a simple interface
> > > to accomplish a simple task. Your proposed API added a very complex
> > > interface to satisfy needs that clearly aren't part of the problem space
> > > that Greg is looking to address.
> >
> > I believe that we should look at the single-file readfile syscall from
> > a performance viewpoint. If an application is expecting to read a
> > couple of small/medium-size files per second, then neither readfile
> > nor readfiles makes sense in terms of improving performance. The
> > benefits start to show up only in case an application is expecting to
> > read at least a hundred of files per second. The "per second" part is
> > important, it cannot be left out. Because readfile only improves
> > performance for many-file reads, the syscall that applications
> > performing many-file reads actually want is the multi-file version,
> > not the single-file version.
>
> It also is a measurable increase over reading just a single file.
> Here's my really really fast AMD system doing just one call to readfile
> vs. one call sequence to open/read/close:
>
> $ ./readfile_speed -l 1
> Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops...
> Took 3410 ns
> Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops...
> Took 3780 ns
>
> 370ns isn't all that much, yes, but it is 370ns that could have been
> used for something else :)

I am curious as to how you amortized or accounted for the fact that
readfile() first needs to open the dirfd and then close it later.

>From performance viewpoint, only codes where readfile() is called
multiple times from within a loop make sense:

dirfd = open();
for(...) {
readfile(dirfd, ...);
}
close(dirfd);

> Look at the overhead these days of a syscall using something like perf
> to see just how bad things have gotten on Intel-based systems (above was
> AMD which doesn't suffer all the syscall slowdowns, only some).
>
> I'm going to have to now dig up my old rpi to get the stats on that
> thing, as well as some Intel boxes to show the problem I'm trying to
> help out with here. I'll post that for the next round of this patch
> series.
>
> > I am not sure I understand why you think that a pointer to an array of
> > readfile_t structures is very complex. If it was very complex then it
> > would be a deep tree or a large graph.
>
> Of course you can make it more complex if you want, but look at the
> existing tools that currently do many open/read/close sequences. The
> apis there don't lend themselves very well to knowing the larger list of
> files ahead of time. But I could be looking at the wrong thing, what
> userspace programs are you thinking of that could be easily converted
> into using something like this?

Perhaps, passing multiple filenames to tools via the command-line is a
valid and quite general use case where it is known ahead of time that
multiple files are going to be read, such as "gcc *.o" which is
commonly used to link shared libraries and executables. Although, in
case of "gcc *.o" some of the object files are likely to be cached in
memory and thus unlikely to be required to be fetched from HDD/SSD, so
the valid use case where we could see a speedup (if gcc was to use the
multi-file readfiles() syscall) is when the programmer/Makefile
invokes "gcc *.o" after rebuilding a small subset of the object files
and the objects files which did not have to be rebuilt are stored on
HDD/SSD, so basically this means 1st-time use of a project's Makefile
in a particular day.