Re: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init_late()

From: Michal Hocko
Date: Mon Jul 24 2017 - 09:06:33 EST


On Thu 20-07-17 15:40:29, Vlastimil Babka wrote:
> Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are
> initialized") has avoided a a NULL pointer dereference due to
> DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling page_ext_init()
> only after the deferred struct page init has finished. Later commit
> fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") avoided the
> underlying issue differently and moved the page_ext_init() call back to where
> it was before.
>
> However, there are two problems with the current code:
> - on very large machines, page_ext_init() may fail to allocate the page_ext
> structures, because deferred struct page init hasn't yet started, and the
> pre-inited part might be too small.
> This has been observed with a 3TB machine with page_owner=on. Although it
> was an older kernel where page_owner hasn't yet been converted to stack depot,
> thus page_ext was larger, the fundamental problem is still in mainline.

I was about to suggest using memblock/bootmem allocator but it seems
that page_ext_init is called passed mm_init. Is there any specific
reason why we cannot do the per-section initialization along with the
rest of the memory section init code which should have an early
allocator available?

> - page_owner's init_pages_in_zone() is called before deferred struct page init
> has started, so it will encounter unitialized struct pages. This currently
> happens to cause no harm, because the memmap array is are pre-zeroed on
> allocation and thus the "if (page_zone(page) != zone)" check is negative, but
> that pre-zeroing guarantee might change soon.

Yes this is annoying and the bug IMHO. We shouldn't consider spanned
pages but rather the maximum valid pfn for the zone. The rest simply
cannot by used by anybody so there shouldn't be any page_ext work due.
Or am I missing something?
--
Michal Hocko
SUSE Labs