Re: [PATCH 00/33] Network fs helper library & fscache kiocb API [ver #3]

From: Matthew Wilcox
Date: Tue Feb 23 2021 - 15:30:47 EST


On Mon, Feb 15, 2021 at 11:22:20PM -0600, Steve French wrote:
> On Mon, Feb 15, 2021 at 8:10 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > The switch from readpages to readahead does help in a couple of corner
> > cases. For example, if you have two processes reading the same file at
> > the same time, one will now block on the other (due to the page lock)
> > rather than submitting a mess of overlapping and partial reads.
>
> Do you have a simple repro example of this we could try (fio, dbench, iozone
> etc) to get some objective perf data?

I don't. The problem was noted by the f2fs people, so maybe they have a
reproducer.

> My biggest worry is making sure that the switch to netfs doesn't degrade
> performance (which might be a low bar now since current network file copy
> perf seems to signifcantly lag at least Windows), and in some easy to understand
> scenarios want to make sure it actually helps perf.

I had a question about that ... you've mentioned having 4x4MB reads
outstanding as being the way to get optimum performance. Is there a
significant performance difference between 4x4MB, 16x1MB and 64x256kB?
I'm concerned about having "too large" an I/O on the wire at a given time.
For example, with a 1Gbps link, you get 250MB/s. That's a minimum
latency of 16us for a 4kB page, but 16ms for a 4MB page.

"For very simple tasks, people can perceive latencies down to 2 ms or less"
(https://danluu.com/input-lag/)
so going all the way to 4MB I/Os takes us into the perceptible latency
range, whereas a 256kB I/O is only 1ms.

So could you do some experiments with fio doing direct I/O to see if
it takes significantly longer to do, say, 1TB of I/O in 4MB chunks vs
256kB chunks? Obviously use threads to keep lots of I/Os outstanding.