Re: mm: delay rmap removal until after TLB flush

From: Christian Borntraeger
Date: Wed Nov 02 2022 - 05:27:08 EST



Am 02.11.22 um 10:14 schrieb Christian Borntraeger:
Am 31.10.22 um 19:43 schrieb Linus Torvalds:
Updated subject line, and here's the link to the original discussion
for new people:

     https://lore.kernel.org/all/B88D3073-440A-41C7-95F4-895D3F657EF2@xxxxxxxxx/

On Mon, Oct 31, 2022 at 10:28 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

Ok. At that point we no longer have the pte or the virtual address, so
it's not going to be exactly the same debug output.

But I think it ends up being fairly natural to do

         VM_WARN_ON_ONCE_PAGE(page_mapcount(page) < 0, page);

instead, and I've fixed that last patch up to do that.

Ok, so I've got a fixed set of patches based on the feedback from
PeterZ, and also tried to do the s390 updates for this blindly, and
pushed them out into a git branch:

     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=mmu_gather-race-fix

If people really want to see the patches in email again, I can do
that, but most of you already have, and the changes are either trivial
fixes or the s390 updates.

For the s390 people that I've now added to the participant list maybe
the git tree is fine - and the fundamental explanation of the problem
is in that top-most commit (with the three preceding commits being
prep-work). Or that link to the thread about this all.

Adding Gerald.

now the correct Gerald....


That top-most commit is also where I tried to fix things up for s390
that uses its own non-gathering TLB flush due to
CONFIG_MMU_GATHER_NO_GATHER.

NOTE NOTE NOTE! Unlike my regular git branch, this one may end up
rebased etc for further comments and fixes. So don't consider that
stable, it's still more of an RFC branch.

At a minimum I'll update it with Ack's etc, assuming I get those, and
my s390 changes are entirely untested and probably won't work.

As far as I can tell, s390 doesn't actually *have* the problem that
causes this change, because of its synchronous TLB flush, but it
obviously needs to deal with the change of rmap zapping logic.

Also added a few people who are explicitly listed as being mmu_gather
maintainers. Maybe people saw the discussion on the linux-mm list, but
let's make it explicit.

Do people have any objections to this approach, or other suggestions?

I do *not* consider this critical, so it's a "queue for 6.2" issue for me.

It probably makes most sense to queue in the -MM tree (after the thing
is acked and people agree), but I can keep that branch alive too and
just deal with it all myself as well.

Anybody?

                      Linus

It certainly needs a build fix for s390:


In file included from kernel/sched/core.c:78:
./arch/s390/include/asm/tlb.h: In function '__tlb_remove_page_size':
./arch/s390/include/asm/tlb.h:50:17: error: implicit declaration of function 'page_zap_pte_rmap' [-Werror=implicit-function-declaration]
   50 |                 page_zap_pte_rmap(page);
      |                 ^~~~~~~~~~~~~~~~~