Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

From: Michal Hocko
Date: Fri Apr 28 2017 - 02:31:01 EST


On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > >
> > > Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx>
> > > ---
> > > mm/memory_hotplug.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > if (PageReserved(pfn_to_page(start_pfn)))
> > > for (i = 0; i < nr_pages; i++) {
> > > page = pfn_to_page(start_pfn + i);
> > > + if (PageHWPoison(page)) {
> > > + ClearPageReserved(page);
> >
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
>
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.

Is this patch actually correct? I am trying to wrap my head around it
but it smells like it tries to avoid the problem rather than fix it
properly. I might be wrong here of course but to me it sounds like
poisoned page should simply be offlined and keep its poison state all
the time. If the memory is hot-removed and added again we have lost the
struct page along with the state which is the expected behavior. If it
is still broken we will re-poison it.

Anyway a patch to skip over poisoned pages during online makes perfect
sense to me. The PageReserved fiddling around much less so.

Or am I missing something. Let's CC Wen Congyang for the clarification
here.
--
Michal Hocko
SUSE Labs