Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?

From: Michal Hocko
Date: Tue Mar 10 2020 - 17:09:13 EST


On Tue 10-03-20 20:11:45, Jann Horn wrote:
> On Tue, Mar 10, 2020 at 7:48 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Tue 10-03-20 19:08:28, Jann Horn wrote:
> > > Hi!
> > >
> > > >From looking at the source code, it looks to me as if using
> > > MADV_PAGEOUT on a CoW anonymous mapping will page out the page if
> > > possible, even if other processes still have the same page mapped. Is
> > > that correct?
> > >
> > > If so, that's probably bad in environments where many processes (with
> > > different privileges) are forked from a single zygote process (like
> > > Android and Chrome), I think? If you accidentally call it on a CoW
> > > anonymous mapping with shared pages, you'll degrade the performance of
> > > other processes. And if an attacker does it intentionally, they could
> > > use that to aid with exploiting race conditions or weird
> > > microarchitectural stuff (e.g. the new https://lviattack.eu/lvi.pdf
> > > talks about "the assumption that attackers can provoke page faults or
> > > microcode assists for (arbitrary) load operations in the victim
> > > domain").
> > >
> > > Should madvise_cold_or_pageout_pte_range() maybe refuse to operate on
> > > pages with mapcount>1, or something like that? Or does it already do
> > > that, and I just missed the check?
> >
> > I have brought up side channel attacks earlier [1] but only in the
> > context of shared page cache pages. I didn't really consider shared
> > anonymous pages to be a real problem. I was under impression that CoW
> > pages shouldn't be a real problem because any security sensible
> > applications shouldn't allow untrusted code to be forked and CoW
> > anything really important. I believe we have made this assumption
> > in other places - IIRC on gup with FOLL_FORCE but I admit I have
> > very happily forgot most details.

I have quickly checked FOLL_FORCE and it is careful to break CoW on the
write access.

> Android has a "zygote" process that starts up the whole Java
> environment with a bunch of libraries before entering into a loop that
> fork()s off a child every time the user wants to launch an app. So all
> the apps, and even browser renderer processes, on the device share
> many CoW VMAs. See
> <https://developer.android.com/topic/performance/memory-overview#SharingRAM>.

I still have to think about how this could be used for any reasonable
attack. But certainly the simplest workaround is to simply back off on
pages mapped multiple times as we do for THP already. Something like the
following should work but I haven't tested it at all
diff --git a/mm/madvise.c b/mm/madvise.c
index 43b47d3fae02..02daa447bf47 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -351,6 +351,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
goto regular_page;
return 0;
}
+
+ /* Do not interfere with other mappings of this page */
+ if (page_mapcount(page) != 1)
+ goto huge_unlock;

if (pmd_young(orig_pmd)) {
pmdp_invalidate(vma, addr, pmd);
@@ -426,6 +430,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
continue;
}

+ /* Do not interfere with other mappings of this page */
+ if (page_mapcount(page) != 1)
+ continue;
+
VM_BUG_ON_PAGE(PageTransCompound(page), page);

if (pte_young(ptent)) {
--
Michal Hocko
SUSE Labs