Re: [PATCH 1/2] mm/memory_hotplug: remove head page reference in do_migrate_range

From: David Hildenbrand
Date: Tue Jan 24 2023 - 05:18:44 EST


On 23.01.23 21:37, Matthew Wilcox wrote:
On Mon, Jan 23, 2023 at 12:23:46PM -0800, Sidhartha Kumar wrote:
@@ -1637,14 +1637,13 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
continue;
page = pfn_to_page(pfn);
folio = page_folio(page);
- head = &folio->page;
- if (PageHuge(page)) {
- pfn = page_to_pfn(head) + compound_nr(head) - 1;
+ if (folio_test_hugetlb(folio)) {
+ pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
isolate_hugetlb(folio, &source);
continue;
- } else if (PageTransHuge(page))
- pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
+ } else if (folio_test_transhuge(folio))
+ pfn = folio_pfn(folio) + thp_nr_pages(page) - 1;

I'm pretty sure those two lines should be...

} else if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;

But, erm ... we're doing this before we have a refcount on the page,
right? So this is unsafe because the page might change which folio
it is in. And the folio we found earlier might become a tail page
of a different folio. (As the comment below explains, HWPoison pages
won't, so it's not unsafe for them).

Also, thp_nr_pages(page) is going to return 1 for tail pages. So this
is a noop, unless page is a head page.

It's all a bit confusing, and being memory-hotplug, it's not well
tested. More thought needed.

Ehm, it is fairly well tested ;)

As memory offlining keeps retrying, temporarily making wrong assumptions about a folio is acceptable, as long as we don't run into BUGs.

It's certainly worth a big comment in a code, that this is all racy and that page migration code will stabilize.

Now, we could temporarily take a reference, but ... common migration code will try taking its own ref to stabilize the page and would be confused about yet another ref (-> migration will fail).

So we have to be careful about grabbing references on these pages, and how long we're going to hold them. Otherwise we'll break memory offlining completely :)

--
Thanks,

David / dhildenb