Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting

From: Kirill A. Shutemov
Date: Tue Dec 29 2020 - 08:30:01 EST


On Mon, Dec 28, 2020 at 01:58:40PM -0800, Linus Torvalds wrote:
> On Mon, Dec 28, 2020 at 10:47 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I personally think it's wrong to update vmf->pte at all. We should
> > just have a local 'ptep' pointer that we update as we walk along. But
> > that requires another change to the calling convention, namely to
> > "do_set_pte()".
>
> Actually, I think we should not use do_set_pte() at all.
>
> About half of do_set_pte() is about the FAULT_FLAG_WRITE case, which
> the fault-around code never has set (and would be wrong if it did).
>
> So I think do_set_pte() should be made local to mm/memory.c, and the
> filemap_map_pages() code should do it's own simplified version that
> just does the non-writable case, and that just gets passed the address
> and the pte pointer.

Well, do_set_pte() also does rss accounting and _fast version of it is
local to mm/memory.c. It makes it cumbersome to opencode do_set_pte() in
filemap_map_pages(). We need to make fast rss acounting visible outside
memory.c or have some kind of batching in filemap_map_pages().

> At that point, there would no longer be any need to update the
> address/pte fields in the vmf struct, and in fact I think it could be
> made a "const" pointer in this cal chain.

Unfortunately, we would still need to NULLify vmf->prealloc_pte once it's
consumed. It kills idea with const.

> This is all just from looking at the code, I haven't tried to write a
> patch to do this, so I might be missing some case.

I reworked do_fault_around() so it doesn't touch vmf->address and pass
original address down to ->map_pages(). No need in the new argument in
->map_pages. filemap_map_pages() calculates address based on pgoff of the
page handled.