Re: [PATCH] hwpoison: Fix race with changing page during offlining

From: Andrew Morton
Date: Thu Jun 26 2014 - 15:57:03 EST


On Thu, 26 Jun 2014 15:50:36 -0400 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote:

> > index 90002ea..e277726a 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1143,6 +1143,22 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> > lock_page(hpage);
> >
> > /*
> > + * The page could have turned into a non LRU page or
> > + * changed compound pages during the locking.
> > + * If this happens just bail out.
> > + */
> > + if (compound_head(p) != hpage) {
> > + action_result(pfn, "different compound page after locking", IGNORED);
> > + res = -EBUSY;
> > + goto out;
> > + }
>
> This is a useful check.
>
> > + if (!PageLRU(hpage)) {
> > + action_result(pfn, "non LRU after locking", IGNORED);
> > + res = -EBUSY;
> > + goto out;
> > + }
>
> I think this makes sense in v3.14, but maybe redundant if the patch "hwpoison:
> fix the handling path of the victimized page frame that belong to non-LRU"
> from Chen Yucong is merged into mainline (now it's in linux-mmotm).

Andi, can you please check that and test? If the patch is good I'll
bump it into 3.16 with an enhanced changelog..


From: Chen Yucong <slaoub@xxxxxxxxx>
Subject: hwpoison: fix the handling path of the victimized page frame that belong to non-LRU

Until now, the kernel has the same policy to handle victimized page frames
that belong to kernel-space(reserved/slab-subsystem) or non-LRU(unknown
page state). In other word, the result of handling either of these
victimized page frames is (IGNORED | FAILED), and the return value of
memory_failure() is -EBUSY.

This patch is to avoid that memory_failure() returns very soon due to the
"true" value of (!PageLRU(p)), and it also ensures that action_result()
can report more precise information("reserved kernel", "kernel slab", and
"unknown page state") instead of "non LRU", especially for memory errors
which are detected by memory-scrubbing.

Signed-off-by: Chen Yucong <slaoub@xxxxxxxxx>
Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

mm/memory-failure.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff -puN mm/memory-failure.c~hwpoison-fix-the-handling-path-of-the-victimized-page-frame-that-belong-to-non-lur mm/memory-failure.c
--- a/mm/memory-failure.c~hwpoison-fix-the-handling-path-of-the-victimized-page-frame-that-belong-to-non-lur
+++ a/mm/memory-failure.c
@@ -895,7 +895,7 @@ static int hwpoison_user_mappings(struct
struct page *hpage = *hpagep;
struct page *ppage;

- if (PageReserved(p) || PageSlab(p))
+ if (PageReserved(p) || PageSlab(p) || !PageLRU(p))
return SWAP_SUCCESS;

/*
@@ -1159,9 +1159,6 @@ int memory_failure(unsigned long pfn, in
action_result(pfn, "free buddy, 2nd try", DELAYED);
return 0;
}
- action_result(pfn, "non LRU", IGNORED);
- put_page(p);
- return -EBUSY;
}
}

@@ -1194,6 +1191,9 @@ int memory_failure(unsigned long pfn, in
return 0;
}

+ if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p))
+ goto identify_page_state;
+
/*
* For error on the tail page, we should set PG_hwpoison
* on the head page to show that the hugepage is hwpoisoned
@@ -1243,6 +1243,7 @@ int memory_failure(unsigned long pfn, in
goto out;
}

+identify_page_state:
res = -EBUSY;
/*
* The first check uses the current page flags which may not have any
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/