Re: page-flags behavior on compound pages: a worry

From: Hugh Dickins
Date: Thu Aug 13 2015 - 00:13:46 EST


On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:
>
> All this situation is ugly. I'm thinking on more general solution for
> PageTail() vs. ->first_page race.
>
> We would be able to avoid the race in first place if we encode PageTail()
> and position of head page within the same word in struct page. This way we
> update both thing in one shot without possibility of race.
>
> Details get tricky.
>
> I'm going to try tomorrow something like this: encode the position of head
> as offset from the tail page and store it as negative number in the union
> with ->mapping and ->s_mem. PageTail() can be implemented as check value
> of the field to be in range -1..-MAX_ORDER_NR_PAGES.
>
> I'm not sure at all if it's going to work, especially looking on
> ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.
>
> We could also try to encode page order instead (again as negative number)
> and calculate head page position based on alignment...
>
> Any other ideas are welcome.

Good luck, I've not given it any thought, but hope it works out:
my reasoning was the same when I put the PageAnon bit into
page->mapping instead of page->flags.

Something to beware of though: although exceedingly unlikely to be a
problem, page->mapping always contained a pointer to or into a relevant
structure, or else something that could not possibly be a kernel pointer,
when I was working on KSM swapping: see comment above get_ksm_page() in
mm/ksm.c. It is best to keep page->mapping for pointers if possible
(and probably avoid having the PageAnon bit set unless really Anon).

I've only just read your mail, and I'm too slow a thinker to have
worked through your isolate_migratepages_block() race yet. But, given
the timing, cannot resist sending you a code fragment I wrote earlier
today for our v3.11-based kernel: which still has compound_trans_order(),
which we had been using in a similar racy physical scan.

I'm not for a moment suggesting that this fragment is relevant to your
race; but it is something amusing to consider when you're thinking of
such races. Credit to Greg Thelen for thinking of the prep_compound_page()
end of it, when I'd been focussed on the __split_huge_page_refcount() end.

/*
* It is not safe to use compound_lock (inside compound_trans_order)
* until we have a reference on the page (okay, done above) and have
* then seen PageLRU on it (just below): because mm/huge_memory.c uses
* the non-atomic __SetPageUptodate on a freshly allocated THPage in
* several places, believing it to be invisible to the outside world,
* but liable to race and leave PG_compound_lock set when cleared here.
*/
nr_pages = 1;
if (PageHead(page)) {
/*
* smp_rmb() against the smp_wmb() in the first iteration of
* prep_compound_page(), so that the PageTail test ensures
* that compound_order(page) is now correctly readable.
*/
smp_rmb();
if (PageTail(page + 1)) {
nr_pages = 1 << compound_order(page);
/*
* Then smp_rmb() against smp_wmb() in last iteration of
* __split_huge_page_refcount(), to ensure that has not
* yet written something else into page[1].lru.prev.
*/
smp_rmb();
if (!PageTail(page + 1))
nr_pages = 1;
}
}

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/