Re: Re:[RFC] a question about reuse hwpoison page in soft_offline_page()

From: Naoya Horiguchi
Date: Mon Jul 09 2018 - 00:19:54 EST


On Mon, Jul 09, 2018 at 10:31:25AM +0800, 裘稀石(稀石) wrote:
> Hi Naoya,
>
> I think the double check can not fix the problem as I said in another email.
> If someone mmap before soft offline, so the page_count(head) is still zero
> in soft offline, then hwpoison flag set and it can not be alloced again in
> dequeue_huge_page_node_exact() during page fault, so page fault return
> no-mem, and someone will be killed (not mce kill).
>
> How about just set hwpoison flag after soft_offline_free_page -
> dissolve_free_huge_page
> successfully? It will fix the both two problems (mce kill and no-mem kill).

Thank you for elaborating, you're right.
So do you like a fix like this?

---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d34225c1cb5b..3c9ce4c05f1b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
/*
* Dissolve a given free hugepage into free buddy pages. This function does
* nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
*/
int dissolve_free_huge_page(struct page *page)
{
- int rc = 0;
+ int rc = -EBUSY;

spin_lock(&hugetlb_lock);
if (PageHuge(page) && !page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
- if (h->free_huge_pages - h->resv_huge_pages == 0) {
- rc = -EBUSY;
+ if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
- }
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page)
h->free_huge_pages_node[nid]--;
h->max_huge_pages--;
update_and_free_page(h, head);
+ rc = 0;
}
out:
spin_unlock(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..e4c7e3ec7b10 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1715,13 +1715,13 @@ static int soft_offline_in_use_page(struct page *page, int flags)

static void soft_offline_free_page(struct page *page)
{
+ int rc = 0;
struct page *head = compound_head(page);

- if (!TestSetPageHWPoison(head)) {
+ if (PageHuge(head))
+ rc = dissolve_free_huge_page(page);
+ if (!rc && !TestSetPageHWPoison(head))
num_poisoned_pages_inc();
- if (PageHuge(head))
- dissolve_free_huge_page(page);
- }
}

/**