Re: [GIT PULL] MM updates for 6.3-rc1

From: David Howells
Date: Fri Feb 24 2023 - 04:05:48 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Yes, I saw David Howells resolution suggestion. I think that one
> was buggy. It would wait for a page under writeback, and then go on to
> the *next* one without writing it back. I don't thin kthat was right.

You're right. Vishal's patch introduced it into afs and I copied it across
and didn't notice it either then or on review of Vishal's patch. He inserted
the extra for-loop as he's now extracting a batch, but kept the continue that
used to repeat the extraction - except now it continues the wrong loop.

So afs will need fixing too. The simplest ways I think are to just decrement
the loop counter before continuing or to stick a goto in back to the beginning
of the loop (which is what you did in cifs). But I'm not sure that's the
correct thing to do. The previous code dropped the found folio and then
repeated the search in case the folio got truncated, migrated or punched. I
suspect that's probably what we should do.


Also, thinking about it again, I'm not sure whether fetching a batch with
filemap_get_folios_tag() like this in {afs,cifs}_writepages_region() is
necessarily the right thing to do. There are three cases I'm thinking of:

(1) A single folio is returned. This is trivial.

(2) A run of contiguous folios are returned - {afs,cifs}_extend_writeback()
is likely to write them back, in which case the batch is probably not
useful. Note that *_extend_writeback() walks the xarray directly itself
as it wants contiguous folios and doesn't want to extract any folio it's
not going to use.

(3) A list of scattered folios is returned. Granted this is more efficient
if nothing else interferes - but there could be other writes in the gaps
that we then skip over, other flushes that render some of our list clean
or page invalidations. This is a change in behaviour, but I'm not sure
that matters too much since a flush/sync can only be expected to write
back what's modified at the time it is initiated.

Further, processing each entry in the list is potentially very slow
because we're doing a write across the network for each one (cifs might
bump this into the background, but it might also have to (re)open a file
handle on the server and wait for credits first to even begin the
transaction).

Which means all of the folios in the batch may then get pinned for a long
period of time - up to 14x for the last folio in the batch - which could
prevent things like page migration.

Further, we might not get to write out all the folios in the batch as
*_extend_writeback() might hit the wbc limit first.

> That said, I'm not at all convinced my version is right either. I
> can't test it, and that means I probably messed up. It looked sane to
> me when I did it, and it builds cleanly, but I honestly doubt myself.

It doesn't seem to work. A write seems to end in lots of:

CIFS: VFS: No writable handle in writepages rc=-9

being emitted. I'll poke further into it - there's always the possibility
that some other patch is interfering.

David