Re: [PATCH v1] mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON

From: HORIGUCHI NAOYA(堀口 直也)
Date: Tue Feb 21 2023 - 18:21:12 EST


On Tue, Feb 21, 2023 at 01:35:39PM +0000, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 05:59:05PM +0900, Naoya Horiguchi wrote:
> > After a memory error happens on a clean folio, a process unexpectedly
> > receives SIGBUS when it accesses to the error page. This SIGBUS killing
> > is pointless and simply degrades the level of RAS of the system, because
> > the clean folio can be dropped without any data lost on memory error
> > handling as we do for a clean pagecache.
> >
> > When memory_failure() is called on a clean folio, try_to_unmap() is called
> > twice (one from split_huge_page() and one from hwpoison_user_mappings()).
> > The root cause of the issue is that pte conversion to hwpoisoned entry is
> > now done in the first call of try_to_unmap() because PageHWPoison is already
> > set at this point, while it's actually expected to be done in the second
> > call. This behavior disturbs the error handling operation like removing
> > pagecache, which results in the malfunction described above.
> >
> > So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
> > when we really intend to convert pte to hwpoison entry. This can prevent
> > other callers of try_to_unmap() from accidentally converting to hwpoison
> > entries.
> >
> > Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()")
>
> How did you choose this Fixes tag?

I thought that before this commit thps are anonymous thps or shmem thps,
both of which are considered as dirty thps (with no backup on storage).
The reported problem affects the case of clean folio, so I thought that
it got visible since we can have clean folios.

But in my second thought, the wrong pte conversion could also happen on
generic thp split (that happened to have no effect on visible behavior),
so I should've set Fixes tag to older commit?

Thanks,
Naoya Horiguchi