Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
From: Vlastimil Babka
Date: Tue Jul 08 2025 - 04:50:45 EST
On 7/8/25 00:10, Dave Chinner wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
>> >
>> > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
>> > if (!mp)
>> > return -ENOMEM;
>> >
>> > This looks an allocation attempt which can fail safely.
>
> It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being
> specified.
If you mean the "too small to fail" behavior then it's generally true,
except in some corner cases like being an oom victim, in which case the
allocation can fail - the userspace process is doomed anyway. But a (small)
kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent
that corner case.
>> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
>> to kmalloc()") dropped the xfs wrapper. This allocation didn't use
>> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
>> nofail issue for another allocation site that had to use xlog_kvmalloc().
>
> I don't see how high-order allocation behaviour is relevant here.
>
> Pahole says the struct xfs_mount is 4224 bytes in length. It is an
> order-1 allocation and if we've fragmented memory so badly that slab
> can't allocate an order-1 page then *lots* of other stuff is going
> to be stalling. (e.g. slab pages for inodes are typically order-3,
> same as the kmalloc-8kk slab).
Elsewhere in this thread we figured it out since I wrote the quoted reply.
4224 bytes means kmalloc-8k where the fallback allocation (the one that
passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled
its metadata means the per-object size goes above 8k and thus the fallback
order will be 2. It's a corner case that wasn't anticipated and existed for
years without known reports. We'll need to deal with it somehow.
> Note that the size of the structure is largely because of the
> embedded cpumask for inodegc:
>
> struct cpumask m_inodegc_cpumask; /* 3104 1024 */
>
> This should probably be pulled out into a dynamically allocated
> inodegc specific structure. Then the struct xfs_mount is only a
> order-0 allocation and should never fail, regardless of
> __GFP_NOFAIL being specified or not.
>
>> I think either this allocation really can fail as the code (return
>> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
>> kvmalloc() - I think the wrapper for that can be removed now too after
>> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
>> make kmalloc fast path real fast path").
>
> I know about that - I have patches that I'm testing that replace
> xlog_kvmalloc() with kvmalloc calls.
Great, thanks!
> -Dave.