Re: [RFC PATCH v2] mm: initialize struct pages reserved by ZONE_DEVICE driver.

From: Waiman Long
Date: Tue Sep 17 2019 - 11:49:43 EST


On 9/17/19 3:13 AM, David Hildenbrand wrote:
> On 17.09.19 04:34, Toshiki Fukasawa wrote:
>> On 2019/09/09 16:46, David Hildenbrand wrote:
>>> Let's take a step back here to understand the issues I am aware of. I
>>> think we should solve this for good now:
>>>
>>> A PFN walker takes a look at a random PFN at a random point in time. It
>>> finds a PFN with SECTION_MARKED_PRESENT && !SECTION_IS_ONLINE. The
>>> options are:
>>>
>>> 1. It is buddy memory (add_memory()) that has not been online yet. The
>>> memmap contains garbage. Don't access.
>>>
>>> 2. It is ZONE_DEVICE memory with a valid memmap. Access it.
>>>
>>> 3. It is ZONE_DEVICE memory with an invalid memmap, because the section
>>> is only partially present: E.g., device starts at offset 64MB within a
>>> section or the device ends at offset 64MB within a section. Don't access it.
>> I don't agree with case #3. In the case, struct page area is not allocated on
>> ZONE_DEVICE, but is allocated on system memory. So I think we can access the
>> struct pages. What do you mean "invalid memmap"?
> No, that's not the case. There is no memory, especially not system
> memory. We only allow partially present sections (sub-section memory
> hotplug) for ZONE_DEVICE.
>
> invalid memmap == memmap was not initialized == struct pages contains
> garbage. There is a memmap, but accessing it (e.g., pfn_to_nid()) will
> trigger a BUG.
>
As long as the page structures exist, they should be initialized to some
known state. We could set PagePoison for those invalid memmap. It is the
garbage that are in those page structures that can cause problem if a
struct page walker scan those pages and try to make sense of it.

Cheers,
Longman