RFC: backport of commit a32c1c61212d

From: Doug Berger
Date: Tue Aug 25 2020 - 18:55:24 EST


I recently tracked down a problem I observed when booting a v5.4 kernel
on a sparsemem UMA arm platform which includes a no-map reserved-memory
region in the middle of its HighMem zone.

When memmap_init_zone() is invoked the pfn's that correspond to the
no-map region fail the early_pfn_valid() check and the struct page
structures are not initialized creating a "hole" in the memmap. Later in
my boot sequence the sock_init() initcall leads to a bpf_prog_alloc()
which ends up stealing a page from the block containing the no-map
region which then leads to a call of move_freepages_block() to
reclassify the migratetype of the entire block.

The function move_freepages() includes a check of pfn_valid_within for
each page in the range, but since the arm architecture doesn't include
HOLES_IN_ZONE this check is optimized out and the uninitialized struct
page is accessed. Specifically, PageLRU() calls compound_head() on the
page and if the page->compound_head value is odd the value is used as a
pointer to the head struct page. For uninitialized memory there is a
high chance that a random value of compound head will be odd and contain
an invalid pointer value that causes the kernel to abort and panic.

As you might imagine specifying HOLES_IN_ZONE for the arm build allows
pfn_valid_within to protect against accessing the uninitialized struct
page. However, the performance penalty this incurs seems unnecessary.

Commit 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations") as
part of the "sparse_init rewrite" series introduced in v4.19 changed the
way sparsemem memmaps are initialized. Prior to this patch the sparsemem
memmaps are initialized to all 0's. I observed that on older kernels the
"uninitialized" struct page access also occurs, but the 0
page->compound_head indicates no compound head and the page pointer is
therefore not corrupted. The other logic ends up causing the page to be
skipped and everything "happens to work".

While considering solutions to this issue I observed that the problem
does not occur in the current upstream as a result of a combination of
other commits. The following commits provided functionality to
initialize struct page structures for pages that are unavailable like
the no-map region in my system:
commit a4a3ede2132a ("mm: zero reserved and unavailable struct pages")
commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages")
commit ec393a0f014e ("mm: return zero_resv_unavail optimization")
commit e822969cab48 ("mm/page_alloc.c: fix uninitialized memmaps on a
partially populated last section")
commit 4b094b7851bf ("mm/page_alloc.c: initialize memmap of unavailable
memory directly")

However, those commits added the functionality to the free_area_init()
and free_area_init_nodes() functions and the non-NUMA arm architecture
did not begin calling free_area_init() until the following commit in v5.8:
commit a32c1c61212d ("arm: simplify detection of memory zone boundaries")

Prior to that commit the non-NUMA arm architecture called
free_area_init_node() directly at the end of zone_sizes_init().

So while the problem appears to be fixed upstream by commit a32c1c61212d
("arm: simplify detection of memory zone boundaries") it is still
present in stable branches between v4.19.y and v5.7.y inclusive and
probably for architectures other than arm as well that didn't call
free_area_init(). This upstream commit is not easily/safely backportable
to stable branches, but if we focus on the sliver of functionality that
adds the initialization code from free_area_init() to the
zones_sizes_init() function used by non-NUMA arm kernels I believe a
simple patch could be developed for each relevant stable branch to
resolve the issue I am observing. Similar patches could also be applied
for other architectures that now call free_area_init() upstream but not
in one of these stable branches, but I am not in a position to test
those architectures.

For the linux-5.4.y branch such a patch might look like this: