[PATCH v2] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

From: Naoya Horiguchi
Date: Fri Mar 05 2021 - 07:53:29 EST


Hello Oscar,

On Fri, Mar 05, 2021 at 08:26:58AM +0100, Oscar Salvador wrote:
> On Thu, Mar 04, 2021 at 03:44:37PM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> Hi Naoya,
>
> good catch!
>
> > Currently me_huge_page() temporary unlocks page to perform some actions
> > then locks it again later. My testcase (which calls hard-offline on some
> > tail page in a hugetlb, then accesses the address of the hugetlb range)
> > showed that page allocation code detects the page lock on buddy page and
> > printed out "BUG: Bad page state" message. PG_hwpoison does not prevent
> > it because PG_hwpoison flag is set on any subpage of the hugetlb page
> > but the 2nd page lock is on the head page.
>
> I am having difficulties to parse "PG_hwpoison does not prevent it because
> PG_hwpoison flag is set on any subpage of the hugetlb page".
>
> What do you mean by that?

What was in my mind is that check_new_page_bad() does not consider
a page with __PG_HWPOISON as bad page, so this flag works as kind of
filter, but this filtering doesn't work in my case because the
"bad page" is not the actual hwpoisoned page.

Thank for nice comment, I've updated the patch below with this description.

>
> >
> > This patch suggests to drop the 2nd page lock to fix the issue.
> >
> > Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> The fix looks fine to me:
>
> Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>

Thank you!

Have a nice weekend.
- Naoya

---