Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()

From: HORIGUCHI NAOYA(堀口 直也)
Date: Thu Apr 07 2022 - 21:56:22 EST


On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote:
> On 2022/4/7 19:29, Naoya Horiguchi wrote:
...
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +{
> > + struct page *page = pfn_to_page(pfn);
> > + struct page *head = compound_head(page);
> > + int ret = 2; /* fallback to normal page handling */
> > + bool count_increased = false;
> > +
> > + if (!PageHeadHuge(head))
> > + goto out;
> > +
> > + if (flags & MF_COUNT_INCREASED) {
> > + ret = 1;
> > + count_increased = true;
> > + } else if (HPageFreed(head) || HPageMigratable(head)) {
> > + ret = get_page_unless_zero(head);
> > + if (ret)
> > + count_increased = true;
> > + } else {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + if (hwpoison_filter(page)) {
> > + ret = -EOPNOTSUPP;
> > + goto out;
> > + }
>
> Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or
> lock_page + unlock_page pair is indeed required?

Hmm, we had better call hwpoison_filter in page lock for hugepages.
I'll move this too, thank you.

> > +
> > + if (TestSetPageHWPoison(head)) {
> > + ret = -EHWPOISON;
> > + goto out;
> > + }
>
> Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags
> when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks!

Good point, I even didn't notice that. And the issue still seems to exist
for normal page's cases. Maybe encountering "already hwpoisoned" case from
madvise_inject_error() is rare but could happen when the first call failed
to contain the error (which is still accessible from the calling process).

>
> > +
> > + return ret;
> > +out:
> > + if (count_increased)
> > + put_page(head);
> > + return ret;
> > +}
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +/*
> > + * Taking refcount of hugetlb pages needs extra care about race conditions
> > + * with basic operations like hugepage allocation/free/demotion.
> > + * So all necessary prechecks for hwpoison (like pinning, testing/setting
> > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range.
> > + */
> > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
> > {
> > - struct page *p = pfn_to_page(pfn);
> > - struct page *head = compound_head(p);
> > int res;
> > + struct page *p = pfn_to_page(pfn);
> > + struct page *head;
> > unsigned long page_flags;
> > + bool retry = true;
> >
> > - if (TestSetPageHWPoison(head)) {
> > - pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > - pfn);
> > - res = -EHWPOISON;
> > - if (flags & MF_ACTION_REQUIRED)
> > + *hugetlb = 1;
> > +retry:
> > + res = get_huge_page_for_hwpoison(pfn, flags);
> > + if (res == 2) { /* fallback to normal page handling */
> > + *hugetlb = 0;
> > + return 0;
> > + } else if (res == -EOPNOTSUPP) {
> > + return res;
> > + } else if (res == -EHWPOISON) {
> > + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > + if (flags & MF_ACTION_REQUIRED) {
> > + head = compound_head(p);
> > res = kill_accessing_process(current, page_to_pfn(head), flags);
> > + }
> > + return res;
> > + } else if (res == -EBUSY) {
> > + if (retry) {
> > + retry = false;
> > + goto retry;
> > + }
> > + action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > return res;
> > }
> >
> > num_poisoned_pages_inc();
> >
> > - if (!(flags & MF_COUNT_INCREASED)) {
> > - res = get_hwpoison_page(p, flags);
> > - if (!res) {
> > - lock_page(head);
> > - if (hwpoison_filter(p)) {
> > - if (TestClearPageHWPoison(head))
> > - num_poisoned_pages_dec();
> > - unlock_page(head);
> > - return -EOPNOTSUPP;
> > - }
> > - unlock_page(head);
> > - res = MF_FAILED;
> > - if (__page_handle_poison(p)) {
> > - page_ref_inc(p);
> > - res = MF_RECOVERED;
> > - }
> > - action_result(pfn, MF_MSG_FREE_HUGE, res);
> > - return res == MF_RECOVERED ? 0 : -EBUSY;
> > - } else if (res < 0) {
> > - action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > - return -EBUSY;
> > + /*
> > + * Handling free hugepage. The possible race with hugepage allocation
> > + * or demotion can be prevented by PageHWPoison flag.
> > + */
> > + if (res == 0) {
> > + res = MF_FAILED;
> > + if (__page_handle_poison(p)) {
> > + page_ref_inc(p);
> > + res = MF_RECOVERED;
> > }
> > + action_result(pfn, MF_MSG_FREE_HUGE, res);
> > + return res == MF_RECOVERED ? 0 : -EBUSY;
> > }
> >
> > + head = compound_head(p);
> > lock_page(head);
> >
> > /*
>
> IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock:
>
> /*
> * The page could have changed compound pages due to race window.
> * If this happens just bail out.
> */
> if (!PageHuge(p) || compound_head(p) != head) {
> action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> res = -EBUSY;
> goto out;
> }
>
> But this might be another patch.

I'll do this.

Thank you for the review and suggestions,

- Naoya Horiguchi