Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section

From: David Hildenbrand
Date: Mon Dec 17 2018 - 04:38:38 EST


On 14.12.18 20:23, Gerald Schaefer wrote:
> On Fri, 14 Dec 2018 16:49:14 +0100
> David Hildenbrand <david@xxxxxxxxxx> wrote:
>
>> On 14.12.18 16:22, David Hildenbrand wrote:
>>> On 12.12.18 18:27, Mikhail Zaslonko wrote:
>>>> If memory end is not aligned with the sparse memory section boundary, the
>>>> mapping of such a section is only partly initialized. This may lead to
>>>> VM_BUG_ON due to uninitialized struct page access from
>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>>> memory_hotplug sysfs handlers:
>>>>
>>>> Here are the the panic examples:
>>>> CONFIG_DEBUG_VM=y
>>>> CONFIG_DEBUG_VM_PGFLAGS=y
>>>>
>>>> kernel parameter mem=2050M
>>>> --------------------------
>>>> page:000003d082008000 is uninitialized and poisoned
>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>> Call Trace:
>>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>> [<00000000003e4194>] seq_read+0x204/0x480
>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>> [<00000000003b55b2>] vfs_read+0x82/0x138
>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>> Last Breaking-Event-Address:
>>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>
>>>> kernel parameter mem=3075M
>>>> --------------------------
>>>> page:000003d08300c000 is uninitialized and poisoned
>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>> Call Trace:
>>>> ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>>>> [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>> [<00000000003e4194>] seq_read+0x204/0x480
>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>> [<00000000003b55b2>] vfs_read+0x82/0x138
>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>> Last Breaking-Event-Address:
>>>> [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>
>>>> Fix the problem by initializing the last memory section of each zone
>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>>> end.
>>>>
>>>> Signed-off-by: Mikhail Zaslonko <zaslonko@xxxxxxxxxxxxx>
>>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxx>
>>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>>> ---
>>>> mm/page_alloc.c | 12 ++++++++++++
>>>> 1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 2ec9cc407216..e2afdb2dc2c5 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>> cond_resched();
>>>> }
>>>> }
>>>> +#ifdef CONFIG_SPARSEMEM
>>>> + /*
>>>> + * If the zone does not span the rest of the section then
>>>> + * we should at least initialize those pages. Otherwise we
>>>> + * could blow up on a poisoned page in some paths which depend
>>>> + * on full sections being initialized (e.g. memory hotplug).
>>>> + */
>>>> + while (end_pfn % PAGES_PER_SECTION) {
>>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
>>>> + end_pfn++;
>>>
>>> This page will not be marked as PG_reserved - although it is a physical
>>> memory gap. Do we care?
>>>
>>
>> Hm, or do we even have any idea what this is (e.g. could it also be
>> something not a gap)?
>
> In the "mem=" restriction scenario it would be a gap, and probably fall
> into the PG_reserved categorization from your recent patch:
> * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
> * to read/write these pages might end badly. Don't touch!
>
> Not sure if it could be something else. In theory, if it is possible to have
> a scenario where memory zones are not section-aligned, then this
> end_pfn % PAGES_PER_SECTION part could be part of another zone. But then it
> should not matter if the pages get pre-initialized here, with or w/o
> PG_reseved, because they should later be properly initialized in their zone.
>
> So marking them as PG_reserved sounds right, especially in the light of your
> current PG_reserved clean-up.
>
>>
>> For physical memory gaps within a section, architectures usually exclude
>> that memory from getting passed to e.g. the page allocator by
>> memblock_reserve().
>>
>> Before handing all free pages to the page allocator, all such reserved
>> memblocks will be marked reserved.
>>
>> But this here seems to be different. We don't have a previous
>> memblock_reserve(), because otherwise these pages would have properly
>> been initialized already when marking them reserved.
>
> Not sure how memblock_reserve() and struct page initialization are
> related, but at least on s390 there is a memblock_reserve() on the range
> in question in setup_arch() -> reserve_memory_end(). However, in this
> "mem=" scenario, the range is also removed later with memblock_remove()
> in setup_memory_end(), because it is beyond memory_end.
>

I am wondering if we should fix this on the memblock level instead than.
Something like, before handing memory over to the page allocator, add
memory as reserved up to the last section boundary. Or even when setting
the physical memory limit (mem= scenario).

--

Thanks,

David / dhildenb