Re: [PATCH v1 1/3] mm: fix uninitialized memmaps on a partially populated last section

From: Daniel Jordan
Date: Mon Dec 09 2019 - 16:15:14 EST


Hi David,

On Mon, Dec 09, 2019 at 06:48:34PM +0100, David Hildenbrand wrote:
> If max_pfn is not aligned to a section boundary, we can easily run into
> BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
> memory size that is not a multiple of 128MB (e.g., 4097MB, but also
> 4160MB). I was told that on real HW, we can easily have this scenario
> (esp., one of the main reasons sub-section hotadd of devmem was added).
>
> The issue is, that we have a valid memmap (pfn_valid()) for the
> whole section, and the whole section will be marked "online".
> pfn_to_online_page() will succeed, but the memmap contains garbage.
>
> E.g., doing a "cat /proc/kpageflags > /dev/null" results in
>
> [ 303.218313] BUG: unable to handle page fault for address: fffffffffffffffe
> [ 303.218899] #PF: supervisor read access in kernel mode
> [ 303.219344] #PF: error_code(0x0000) - not-present page
> [ 303.219787] PGD 12614067 P4D 12614067 PUD 12616067 PMD 0
> [ 303.220266] Oops: 0000 [#1] SMP NOPTI
> [ 303.220587] CPU: 0 PID: 424 Comm: cat Not tainted 5.4.0-next-20191128+ #17

I can't reproduce this on x86-64 qemu, next-20191128 or mainline, with either
memory size. What config are you using? How often are you hitting it?

It may not have anything to do with the config, and I may be getting lucky with
the garbage in my memory.