Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned

From: Linus Torvalds
Date: Fri Sep 25 2020 - 18:16:34 EST


On Fri, Sep 25, 2020 at 2:13 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Fri, Sep 25, 2020 at 12:56:05PM -0700, Linus Torvalds wrote:
> > So I think we can simply add a
> >
> > if (page_mapcount(page) != 1)
> > return false;
> >
> > to page_maybe_dma_pinned(), and that very naturally protects against
> > the "is the page count perhaps elevated due to a lot of forking?"
>
> How about the MAP_SHARED case where the page is pinned by some process but also
> shared (so mapcount can be >1)?

MAP_SHARED doesn't matter, since it's not getting COW'ed anyway, and
we keep the page around regardless.

So MAP_SHARED is the easy case. We'll never get to any of this code,
because is_cow_mapping() won't be true.

You can still screw up MAP_SHARED if you do a truncate() on the
underlying file or something like that, but that most *definitely*
falls under the "you only have yourself to blame" heading.

> Would the ATOMIC version always work? I mean, I thought it could fail anytime,
> so any fork() can start to fail for the tests too.

Sure. I'm not really happy about GFP_ATOMIC, but I suspect it works in practice.

Honestly, if somebody first pins megabytes of memory, and then does a
fork(), they are doing some seriously odd and wrong things. So I think
this should be a "we will try to handle it gracefully, but your load
is broken" case.

I am still inclined to add some kind of warning to this case, but I'm
also a bit on the fence wrt the whole "convenience" issue - for some
very occasional use it's probably convenient to not have to worry
about this in user space.

Actually, what I'm even less happy about than the GFP_ATOMIC is how
much annoying boilerplate this just "map anonymous page" required with
the whole cgroup_charge, throttle, anon_rmap, lru_cache_add thing.
Looking at that patch, it all looks _fairly_ simple, but there's a lot
of details that got duplicated from the pte_none() new-page-fault case
(and that the do_cow_page() case also shares)

I understand why it happens, and there's not *that* many cases, it
made me go "ouch, this is a lot of small details, maybe I missed
some", and I got the feeling that I should try to re-organize a few
helper functions to avoid duplicating the same basic code over and
over again.

But I decided that I wanted to keep the patch minimal and as focused
as possible, so I didn't actually do that. But we clearly have decades
of adding rules that just makes even something as "simple" as "add a
new page to a VM" fairly complex.

Also, to avoid making the patch bigger, I skipped your "pass
destination vma around" patch. I think it's the right thing
conceptually, but everything I looked at also screamed "we don't
actually care about the differences" to me.

I dunno. I'm conflicted. This really _feels_ to me like "we're so
close to just fixing this once and for all", but then I also go "maybe
we should just revert everything and do this for 5.10".

Except "reverting everything" is sadly really really problematic too.
It will fix the rdma issue, but in this case "everything" goes all the
way back to "uhhuh, we have a security issue with COW going the wrong
way". Otherwise I'd have gone that way two weeks ago already.

Linus