Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
From: Matthew Wilcox
Date: Fri Aug 15 2025 - 17:01:25 EST
On Fri, Aug 15, 2025 at 11:43:25AM -0700, Roman Gushchin wrote:
> The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
> mappings") causes a regression in our production for containers
> which are running short on memory. In some cases they are getting
> stuck for hours in a vicious reclaim cycle. Reverting this commit
> fixes the problem.
>
> As I understand, the intention of the commit is to allocate large folios
> whenever possible, and the idea is to ignore device-specific readahead
> settings and the mmap_miss logic to achieve that, which makes total
> sense.
>
> However under a heavy memory pressure there must be a mechanism to
> revert to order-0 folios, otherwise the memory pressure is inevitable
> increased. Maybe mmap_miss heuristics should still be applied? Any other
> ideas how to fix it?
What's supposed to happen is that we should have logic like:
if (order > min_order)
alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
so we try a little bit to free memory if we can't allocate an order-9
folio immediately, but we shouldn't be retrying for hours. Maybe
that got lost somewhere along the line because I don't see it now.
> Also, a side question: I wonder if it makes sense to allocate 1-2
> PMD-sized folios if mapping_large_folio_support() is not there?
Um, we don't?
if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size)
goto fallback;