Re: [PATCH v5 3/3] mm/khugepaged: maintain page cache uptodate flag

From: Hugh Dickins
Date: Thu Mar 23 2023 - 21:56:51 EST


On Thu, 23 Mar 2023, Song Liu wrote:
> On Thu, Mar 23, 2023 at 2:56 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Mar 23, 2023 at 12:07:46PM -0700, Hugh Dickins wrote:
> > > On an earlier audit, for different reasons, I did also run across
> > > lib/buildid.c build_id_parse() using find_get_page() without checking
> > > PageUptodate() - looks as if it might do the wrong thing if it races
> > > with khugepaged collapsing text to huge, and should probably have a
> > > similar fix.
> >
> > That shouldn't be using find_get_page(). It should probably use
> > read_cache_folio() which will actually read in the data if it's not
> > present in the page cache, and return an ERR_PTR if the data couldn't
> > be read.
>
> build_id_parse() can be called from NMI, so I don't think we can let
> read_cache_folio() read-in the data.

Interesting.

This being the same Layering_Violation_ID which is asking for a home in
everyone's struct file? (Okay, I'm being disagreeable, no need to answer!)

I think even the current find_get_page() is unsafe from NMI: imagine that
NMI interrupting a sequence (maybe THP collapse or splitting, maybe page
migration, maybe others) when the page refcount has been frozen to 0:
you'll just have to reboot the machine?

I guess the RCU-safety of find_get_page() implies that its XArray basics
are safe in NMI; but you need a low-level variant (xas_find()?) which
does none of the "goto retry"s, and fails immediately if anything is
wrong - including !PageUptodate.

Hugh