Re: page-flags behavior on compound pages: a worry

From: Kirill A. Shutemov
Date: Wed Aug 12 2015 - 18:21:44 EST


On Wed, Aug 12, 2015 at 02:16:44PM -0700, Andrew Morton wrote:
> On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> wrote:
>
> > On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > > > IIUC, the only potentially problematic callsites left are physical memory
> > > > scanners. This code requires audit. I'll do that.
> > >
> > > Please.
> >
> > I haven't finished the exercise yet. But here's an issue I believe present
> > in current *Linus* tree:
> >
> > >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Date: Wed, 12 Aug 2015 17:09:16 +0300
> > Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
> >
> > Hugh has pointed that compound_head() call can be unsafe in some context.
> > There's one example:
> >
> > CPU0 CPU1
> >
> > isolate_migratepages_block()
> > page_count()
> > compound_head()
> > !!PageTail() == true
> > put_page()
> > tail->first_page = NULL
> > head = tail->first_page
> > alloc_pages(__GFP_COMP)
> > prep_compound_page()
> > tail->first_page = head
> > __SetPageTail(p);
> > !!PageTail() == true
> > <head == NULL dereferencing>
> >
> > The race is pure theoretical. I don't it's possible to trigger it in
> > practice. But who knows.
> >
> > This can be fixed by avoiding compound_head() in unsafe context.
>
> This is nuts :( page_count() should Just Work without us having to
> worry about bizarre races against splitting. Sigh.

Split is not involved. And this race is present even for THP=n. :(

>
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > * admittedly racy check.
> > */
> > if (!page_mapping(page) &&
> > - page_count(page) > page_mapcount(page))
> > + atomic_read(&page->_count) > page_mapcount(page))
> > continue;
>
> If we're going to do this sort of thing, can we please do it in a more
> transparent manner? Let's not sprinkle unexplained and
> incomprehensible direct accesses to ->_count all over the place.
>
> Create a formal function to do this, with an appropriate name and with
> documentation which fully explains what's going on. Then use that
> here, and in has_unmovable_pages() (at least).

All this situation is ugly. I'm thinking on more general solution for
PageTail() vs. ->first_page race.

We would be able to avoid the race in first place if we encode PageTail()
and position of head page within the same word in struct page. This way we
update both thing in one shot without possibility of race.

Details get tricky.

I'm going to try tomorrow something like this: encode the position of head
as offset from the tail page and store it as negative number in the union
with ->mapping and ->s_mem. PageTail() can be implemented as check value
of the field to be in range -1..-MAX_ORDER_NR_PAGES.

I'm not sure at all if it's going to work, especially looking on
ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.

We could also try to encode page order instead (again as negative number)
and calculate head page position based on alignment...

Any other ideas are welcome.

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/