[PATCH v6 12/12] mm,hwpoison: double-check page count in __get_any_page()

From: nao . horiguchi
Date: Thu Aug 06 2020 - 14:50:21 EST


From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>

Soft offlining could fail with EIO due to the race condition with
hugepage migration. This issuse became visible due to the change by
previous patch that makes soft offline handler take page refcount
by its own. We have no way to directly pin zero refcount page, and
the page considered as a zero refcount page could be allocated just
after the first check.

This patch adds the second check to find the race and gives us
chance to handle it more reliably.

Reported-by: Qian Cai <cai@xxxxxx>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
---
mm/memory-failure.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c
index bed4b6aac9a0..e6f6559d573e 100644
--- v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c
+++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c
@@ -1700,6 +1700,9 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
} else if (is_free_buddy_page(p)) {
pr_info("%s: %#lx free buddy page\n", __func__, pfn);
ret = 0;
+ } else if (page_count(p)) {
+ /* raced with allocation */
+ ret = -EBUSY;
} else {
pr_info("%s: %#lx: unknown zero refcount page type %lx\n",
__func__, pfn, p->flags);
@@ -1716,6 +1719,9 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
{
int ret = __get_any_page(page, pfn, flags);

+ if (ret == -EBUSY)
+ ret = __get_any_page(page, pfn, flags);
+
if (ret == 1 && !PageHuge(page) &&
!PageLRU(page) && !__PageMovable(page)) {
/*
--
2.17.1