Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())

From: Kefeng Wang
Date: Fri May 07 2021 - 03:17:46 EST




On 2021/5/6 20:47, Kefeng Wang wrote:


no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
move_freepages at

start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, page
=ef3cc000, page-flags = ffffffff,  pfn2phy = de600000

__free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
__free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
__free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200

Hmm, [de600, de7ff] is not added to the free lists which is correct. But
then it's unclear how the page for de600 gets to move_freepages()...

Can't say I have any bright ideas to try here...

Are we missing some checks (e.g., PageReserved()) that pfn_valid_within()
would have "caught" before?

Unless I'm missing something the crash happens in __rmqueue_fallback():

do_steal:
    page = get_page_from_free_area(area, fallback_mt);

    steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
                                can_steal);
        -> move_freepages()
            -> BUG()

So a page from free area should be sane as the freed range was never added
it to the free lists.

Sorry for the late response due to the vacation.

The pfn in range [de600, de7ff] won't be added into the free lists via __free_memory_core(), but the pfn could be added into freelists via free_highmem_page()

I add some debug[1] in add_to_free_list(), we could see the calltrace

free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
add_to_free_list, ===> pfn = de700
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
pfn = de700
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
Hardware name: Hisilicon A9
[<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
[<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
[<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
[<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] (add_to_free_list+0x8c/0xec)
[<c023721c>] (add_to_free_list) from [<c0237e00>] (free_pcppages_bulk+0x200/0x278)
[<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] (free_unref_page+0x58/0x68)
[<c0238d14>] (free_unref_page) from [<c023bb54>] (free_highmem_page+0xc/0x50)
[<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
[<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
[<c0700b38>] (start_kernel) from [<00000000>] (0x0)

so any idea?

If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
but the range of [de600,de700] without ‘struct page' will lead to
this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
and the same issue will occurred in isolate_freepages_block(), maybe
there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in ARCH_HISI, any better solution? Thanks.