Re: [PATCH 1/2] mm: add private lock to serialize memory hotplug operations

From: Dan Williams
Date: Thu Mar 09 2017 - 17:34:15 EST


On Thu, Mar 9, 2017 at 2:15 PM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> On Thursday, March 09, 2017 10:10:31 AM Dan Williams wrote:
>> On Thu, Mar 9, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>> > On Thursday, March 09, 2017 02:06:15 PM Heiko Carstens wrote:
>> >> Commit bfc8c90139eb ("mem-hotplug: implement get/put_online_mems")
>> >> introduced new functions get/put_online_mems() and
>> >> mem_hotplug_begin/end() in order to allow similar semantics for memory
>> >> hotplug like for cpu hotplug.
>> >>
>> >> The corresponding functions for cpu hotplug are get/put_online_cpus()
>> >> and cpu_hotplug_begin/done() for cpu hotplug.
>> >>
>> >> The commit however missed to introduce functions that would serialize
>> >> memory hotplug operations like they are done for cpu hotplug with
>> >> cpu_maps_update_begin/done().
>> >>
>> >> This basically leaves mem_hotplug.active_writer unprotected and allows
>> >> concurrent writers to modify it, which may lead to problems as
>> >> outlined by commit f931ab479dd2 ("mm: fix devm_memremap_pages crash,
>> >> use mem_hotplug_{begin, done}").
>> >>
>> >> That commit was extended again with commit b5d24fda9c3d ("mm,
>> >> devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin,
>> >> done}") which serializes memory hotplug operations for some call
>> >> sites by using the device_hotplug lock.
>> >>
>> >> In addition with commit 3fc21924100b ("mm: validate device_hotplug is
>> >> held for memory hotplug") a sanity check was added to
>> >> mem_hotplug_begin() to verify that the device_hotplug lock is held.
>> >
>> > Admittedly, I haven't looked at all of the code paths involved in detail yet,
>> > but there's one concern regarding lock/unlock_device_hotplug().
>> >
>> > The actual main purpose of it is to ensure safe removal of devices in cases
>> > when they cannot be removed separately, like when a whole CPU package
>> > (including possibly an entire NUMA node with memory and all) is removed.
>> >
>> > One of the code paths doing that is acpi_scan_hot_remove() which first
>> > tries to offline devices slated for removal and then finally removes them.
>> >
>> > The reason why this needs to be done in two stages is because the offlining
>> > can fail, in which case we will fail the entire operation, while the final
>> > removal step is, well, final (meaning that the devices are gone after it no
>> > matter what).
>> >
>> > This is done under device_hotplug_lock, so that the devices that were taken
>> > offline in stage 1 cannot be brought back online before stage 2 is carried
>> > out entirely, which surely would be bad if it happened.
>> >
>> > Now, I'm not sure if removing lock/unlock_device_hotplug() from the code in
>> > question actually affects this mechanism, but this in case it does, it is one
>> > thing to double check before going ahead with this patch.
>> >
>>
>> I *think* we're ok in this case because unplugging the CPU package
>> that contains a persistent memory device will trigger
>> devm_memremap_pages() to call arch_remove_memory(). Removing a pmem
>> device can't fail. It may be held off while pages are pinned for DMA
>> memory, but it will eventually complete.
>
> What about the offlining, though? Is it guaranteed that no memory from those
> ranges will go back online after the acpi_scan_try_to_offline() call in
> acpi_scan_hot_remove()?

The memory described by devm_memremap_pages() is never "onlined" to
the core mm. We're only using arch_add_memory() to get a linear
mapping and page structures. The rest of memory hotplug is skipped,
and this ZONE_DEVICE memory is otherwise hidden from the core mm.

Are ACPI devices disabled by this point? For example, If we have
disabled the nfit bus device (_HID ACPI0012) then the associated child
pmem device(s) will be gone and not coming back.

Now, that said, the ACPI0012 bus device is global for the entire
system. So we'd need more plumbing to target the pmem on a given
socket without touching the others.