Re: Linux regressions report for mainline [2023-02-11]

From: Linus Torvalds
Date: Sat Feb 11 2023 - 16:39:32 EST


On Sat, Feb 11, 2023 at 9:28 AM Regzbot (on behalf of Thorsten
Leemhuis) <regressions@xxxxxxxxxxxxx> wrote:
>
> * A patch to fix a page corruption caused by racy check in __free_pages
> emerged on Thursday. It's caused by a change merged for 5.10-rc1, but
> Vlastimil nevertheless said "That's nasty enough to go into 6.2, IMHO".
> Willy reviewed it, but Andrew didn't pick it up yet.
> https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Ugh. That's indeed nasty.

However, I'm now worried that the problem could be worse.

We do folio_ref_try_add_rcu() in the GUP code too (see try_get_folio()
in mm/gup.c).

I *think* that in that case we're good, because in that case the
underlying page is never really free'd (it's there in the page
tables), so all we can race with is the splitting.

We also do folio_try_get()n in mm/vmscan.c (see isolate_lru_folios()).
There we hold the lry lock and find the folio on the lru list, but if
that is all safe then why could the refcount be zero? I guess it's
because the count is decremented before the lru entry is removed..

So *hopefully* the only case where this can happen is the mm/filemap.c
cases, but this all smells to me.

Maybe we should just bite the bullet and say "page cache pages are
rcu-freed after removing them from the mapping", so that we don't need
the whole folio_try_get_rcu() at all.

I think Andrew (and probably other people too) has always worried
about that magical "tryget" thing forever, but I've always claimed it
was clever and safe. This whole thing clearly shows it was neither of
those things.

But yeah, for 6.2, I think that patch by David Chen is likely the safest thing.

Or even just reverting the original commit e320d3012d25
("mm/page_alloc.c: fix freeing non-compound pages") and say that the
(very rare) memory leak is much less dangerous than that hacky fix
(that was buggy).

Because it's a bit dodgy how commit e320d3012d25 ends up hooking into
__free_pages(), even though that's very much not the only thing that
can actually free memory (ie "put_folio()" itself does the same thing,
just for a different class of allocations).

I dunno.

Linus