Re: Lockless page cache test results

From: Linus Torvalds
Date: Wed Apr 26 2006 - 15:00:44 EST




On Wed, 26 Apr 2006, Andrew Morton wrote:

> Jens Axboe <axboe@xxxxxxx> wrote:
> >
> > Once per page, it's basically exercising the generic_file_splice_read()
> > path. Basically X number of "clients" open the same file, and fill those
> > pages into a pipe using splice. The output end of the pipe is then
> > spliced to /dev/null to toss it away again.
>
> OK. That doesn't sound like something which a real application is likely
> to do ;)

True, but on the other hand, it does kind of "distill" one (small) part of
something that real apps _are_ likely to do.

The whole 'splice to /dev/null' part can be seen as totally irrelevant,
but at the same time a way to ignore all the other parts of normal page
cache usage (ie the other parts of page cache usage tend to be the "map it
into user space" or the actual "memcpy_to/from_user()" or the "TCP send"
part).

The question, of course, is whether the part that remains (the actual page
lookup) is important enough to matter, once it is part of a bigger chain
in a real application.

In other words, the splice() thing is just a way to isolate one part of a
chain that is usually much more involved, and micro-benchmark just that
one part.

Splice itself can be optimized to do the lookup locking only once per N
pages (where N currently is on the order of ~16), but that may not be as
easy for some other paths (ie the normal read path).

And the "reading from the same file in multiple threads" _is_ a real load.
It may sound stupid, but it would happen for any server that has a lot of
locality across clients (and that's very much true for web-servers, for
example).

That said, under most real loads, the page cach elookup is obviously
always going to be just a tiny tiny part (as shown by the fact that Jens
quotes 35 GB/s throughput - possible only because splice to /dev/null
doesn't need to actually ever even _touch_ the data).

The fact that it drops to "just" 3GB/s for four clients is somewhat
interesting, though, since that does put a limit on how well we can serve
the same file (of course, 3GB/s is still a lot faster than any modern
network will ever be able to push things around, but it's getting closer
to the possibilities for real hardware (ie IB over PCI-X should be able to
do about 1GB/s in "real life")

So the fact that basically just lookup/locking overhead can limit things
to 3GB/s is absolutely not totally uninteresting. Even if in practice
there are other limits that would probably hit us much earlier.

It would be interesting to see where doing gang-lookup moves the target,
but on the other hand, with smaller files (and small files are still
common), gang lookup isn't going to help as much.

Of course, with small files, the actual filename lookup is likely to be
the real limiter.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/