Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

From: Dave Chinner
Date: Wed Sep 25 2019 - 04:00:43 EST


On Tue, Sep 24, 2019 at 12:08:04PM -0700, Linus Torvalds wrote:
> On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> >
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
>
> Our dirty_background stuff is very questionable, but it exists (and
> has those insane defaults) because of various legacy reasons.

That's not what I was asking about. The context is in the previous
lines you didn't quote:

> > > > Is the faster speed reproducible? I don't quite understand why this
> > > > would be.
> > >
> > > Writing to disk simply starts earlier.
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:

i.e. I'm asking about the reasons for the performance differential
not asking for an explanation of what writebehind is. If the
performance differential really is caused by writeback starting
sooner, then winding down dirty_background_bytes should produce
exactly the same performance because it will start writeback -much
faster-.

If it doesn't, then the assertion that the difference is caused by
earlier writeout is questionable and the code may not actually be
doing what is claimed....

Basically, I'm asking for proof that the explanation is correct.

> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> >
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
>
> The thing is, that also accounts for dirty shared mmap pages. And it
> really will kill some benchmarks that people take very very seriously.

Yes, I know that. I'm not suggesting that we do this,

[snip]

> Anyway, the end result of all this is that we have that
> balance_dirty_pages() that is pretty darn complex and I suspect very
> few people understand everything that goes on in that function.

I'd agree with you there - most of the ground work for the
balance_dirty_pages IO throttling feedback loop was all based on
concepts I developed to solve dirty page writeback thrashing
problems on Irix back in 2003. The code we have in Linux was
written by Fenguang Wu with help for a lot of people, but the
underlying concepts of delegating IO to dedicated writeback threads
that calculate and track page cleaning rates (BDI writeback rates)
and then throttling incoming page dirtying rate to the page cleaning
rate all came out of my head....

So, much as it may surprise you, I am one of the few people who do
actually understand how that whole complex mass of accounting and
feedback is supposed to work. :)

> Now, whether write-behind really _does_ help that, or whether it's
> just yet another tweak and complication, I can't actually say.

Neither can I at this point - I lack the data and that's why I was
asking if there was a perf difference with the existing limits wound
right down. Knowing whether the performance difference is simply a
result of starting writeback IO sooner tells me an awful lot about
what other behaviour is happening as a result of the changes in this
patch.

> But I
> don't think 'dirty_background_bytes' is really an argument against
> write-behind, it's just one knob on the very complex dirty handling we
> have.

Never said it was - just trying to determine if a one line
explanation is true or not.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx