Re: [PATCH] hwpoison: Fix race with changing page during offlining v2

From: Andi Kleen
Date: Tue Jul 01 2014 - 20:02:53 EST


> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> > lock_page(hpage);
> >
> > /*
> > + * The page could have changed compound pages during the locking.
> > + * If this happens just bail out.
> > + */
> > + if (compound_head(p) != hpage) {
>
> How can a 4k page change compound pages? The original compound page
> was torn down and then this 4k page became part of a differently-size
> compound page?

Yes or it was torn down and now it's its own page.

>
> > + action_result(pfn, "different compound page after locking", IGNORED);
> > + res = -EBUSY;
> > + goto out;
> > + }
> > +
> > + /*
>
> I don't get it. We just go and fail the poisoning attempt? Shouldn't
> we go back, grab the new hpage and try again?

It should be quite rare, so I thought this was safest. An retry loop
would be more difficult to test and may have more side effects.

The hwpoison code by design only tries to handle cases that are
reasonably common in workloads, as visible in page-flags.

I'm not really that concerned about handling this (likely rare case),
just not crashing on it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/