Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

From: Mike Rapoport
Date: Thu Dec 03 2020 - 05:52:08 EST


On Thu, Dec 03, 2020 at 01:23:02AM -0500, Andrea Arcangeli wrote:
> On Wed, Dec 02, 2020 at 07:39:23PM +0200, Mike Rapoport wrote:
> > Hmm, do you have any logs? It worked on my box and with various memory
> > configurations in qemu.
>
> It crashes in reserve_bootmem_region so no logs at all since the
> serial console isn't active yet.
>
> > I believe that the proper solution would be to have all the memory
> > reserved by the firmware added to memblock.memory, like all other
> > architectures do, but I'm not sure if we can easily and reliably
> > determine what of E820 reserved types are actually backed by RAM.
> > So this is not feasible as a short term solution.
> >
> > My patch [1], though, is an immediate improvement and I think it's worth
> > trying to fix off-by-one's that prevent your system from booting.
> >
> > [1] https://lore.kernel.org/lkml/20201201181502.2340-1-rppt@xxxxxxxxxx
>
> Yes that's the patch I applied.
>
> Relevant points after debugging:
>
> 1) can you try again with DEBUG_VM=y?

Mea cupla :)

> 2) DEBUG_VM=y already enforces the memset(page, 0xff, sizeof(struct
> page)) on all struct pages allocated, exactly I was suggesting to
> do in the previous email. I wonder why we're not doing that even
> with DEBUG_VM=n, perhaps it's too slow for TiB systems. See
> page_init_poison().
>
> 3) given 2), your patch below by deleting "0,0" initialization
> achieves the debug feature to keep PagePoison forever on all
> uninitialized pages, imagine PagePoison is really
> NO_NODEID/NO_ZONEID and doesn't need handling other than a crash.
>
> - __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
>
> 4) because of the feature in 3) I now hit the PagePoison VM_BUG_ON
> because pfn 0 wasn't initialized when reserve_bootmem_region tries
> to call __SetPageReserved on a PagePoison page.

So this is the off-by-one a was talking about. My patch removed
initializaion of the hole before the memblock.memory

> 5) pfn 0 is the classical case where pfn 0 is in a reserved zone in
> memblock.reserve that doesn't overlap any memblock.memory zone.

And, IMHO, this is the fundamental BUG.

There is RAM at address 0, there is a DIMM there, so why on earth this
should not be a part of memblock.memory?

> The memblock_start_of_DRAM moves the start of the DMA zone above
> the pfn 0, so then pfn 0 already ends up in the no zone land David
> mentioned where it's not trivial to decide to give it a zoneid if
> it's not even spanned in the zone.
>
> Not just the zoneid, there's not even a nid.
>
> So we have a pfn with no zoneid, and no nid, and not part of the
> zone ranges, not part of the nid ranges not part of the
> memory.memblock.

We have a pfn that should have been in memblock.memory, in ZONE_DMA and
in the first node with memory. If it was not trimmed from there

> We can't really skip the initialization of the the pfn 0, it has to
> get the zoneid 0 treatment or if pfn 1 ends up in compaction code,
> we'd crash again. (although we'd crash with a nice PagePoison(page)
> == true behavior)

Agree. Struct page for pfn should get the same zoneid and nodeid as pfn 1.

For me the below fixup worked (both with and without DEBUG_VM=y).