Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

From: Luck, Tony
Date: Wed Mar 03 2021 - 06:41:01 EST


On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote:
> Hi naoya, tony:
> > >
> > > Idea for what we should do next ... Now that x86 is calling memory_failure()
> > > from user context ... maybe parallel calls for the same page should
> > > be blocked until the first caller completes so we can:
> > > a) know that pages are unmapped (if that happens)
> > > b) all get the same success/fail status
> >
> > One memory_failure() call changes the target page's status and
> > affects all mappings to all affected processes, so I think that
> > (ideally) we don't have to block other threads (letting them
> > early return seems fine). Sometimes memory_failure() fails,
> > but even in such case, PG_hwpoison is set on the page and other
> > threads properly get SIGBUSs with this patch, so I think that
> > we can avoid the worst scenario (like system stall by MCE loop).
> >
> I agree with naoya's point, if we block for this issue, Does this change the result
> that the process should be killed? Or is there something other still need to be considered?

Ok ... no blocking ... I think someone in this thread suggested
scanning the page tables to make sure the poisoned page had been
unmapped.

There's a walk_page_range() function that does all the work for that.
Just need to supply some callback routines that check whether a
mapping includes the bad PFN and clear the PRESENT bit.

RFC patch below against v5.12-rc1

-Tony