[PATCH v5 16/16] mm,hwpoison: double-check page count in __get_any_page()

From: nao . horiguchi
Date: Fri Jul 31 2020 - 08:22:33 EST


From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>

Soft offlining could fail with EIO due to the race condition with
hugepage migration. This issuse became visible due to the change by
previous patch that makes soft offline handler take page refcount
by its own. We have no way to directly pin zero refcount page, and
the page considered as a zero refcount page could be allocated just
after the first check.

This patch adds the second check to find the race and gives us
chance to handle it more reliably.

Reported-by: Qian Cai <cai@xxxxxx>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
---
mm/memory-failure.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c
index 6f242a194c64..b2753ce2b85b 100644
--- v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c
+++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c
@@ -1694,6 +1694,9 @@ static int __get_any_page(struct page *p, unsigned long pfn)
} else if (is_free_buddy_page(p)) {
pr_info("%s: %#lx free buddy page\n", __func__, pfn);
ret = 0;
+ } else if (page_count(p)) {
+ /* raced with allocation */
+ ret = -EBUSY;
} else {
pr_info("%s: %#lx: unknown zero refcount page type %lx\n",
__func__, pfn, p->flags);
@@ -1710,6 +1713,9 @@ static int get_any_page(struct page *page, unsigned long pfn)
{
int ret = __get_any_page(page, pfn);

+ if (ret == -EBUSY)
+ ret = __get_any_page(page, pfn);
+
if (ret == 1 && !PageHuge(page) &&
!PageLRU(page) && !__PageMovable(page)) {
/*
--
2.17.1