Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path
From: Vlastimil Babka
Date: Wed Apr 09 2025 - 05:11:37 EST
On 4/9/25 9:35 AM, Michal Hocko wrote:
> On Thu 03-04-25 21:51:46, Michal Hocko wrote:
>> Add Andrew
>
> Andrew, do you want me to repost the patch or can you take it from this
> email thread?
I'll take it as it's now all in mm/slub.c
>> Also, Dave do you want me to redirect xlog_cil_kvmalloc to kvmalloc or
>> do you preffer to do that yourself?
>>
>> On Thu 03-04-25 09:43:41, Michal Hocko wrote:
>>> There are users like xfs which need larger allocations with NOFAIL
>>> sementic. They are not using kvmalloc currently because the current
>>> implementation tries too hard to allocate through the kmalloc path
>>> which causes a lot of direct reclaim and compaction and that hurts
>>> performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for
>>> CIL shadow buffers") for more details).
>>>
>>> kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that
>>> kmalloc (physically contiguous) allocation is preferred and we should go
>>> more aggressive to make it happen. There is currently no way to express
>>> that kmalloc should be very lightweight and as it has been argued [1]
>>> this mode should be default to support kvmalloc(NOFAIL) with a
>>> lightweight kmalloc path which is currently impossible to express as
>>> __GFP_NOFAIL cannot be combined by any other reclaim modifiers.
>>>
>>> This patch makes all kmalloc allocations GFP_NOWAIT unless
>>> __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both
>>> fail fast and retry hard on physically contiguous memory with vmalloc
>>> fallback.
>>>
>>> There is a potential downside that relatively small allocations (smaller
>>> than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and
>>> cause page block fragmentation. We cannot really rule that out but it
>>> seems that xlog_cil_kvmalloc use doesn't indicate this to be happening.
>>>
>>> [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@xxxxxxxxxxxxxxxxxxx/T/#u
>>> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>>> ---
>>> mm/slub.c | 8 +++++---
>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index b46f87662e71..2da40c2f6478 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
>>> * We want to attempt a large physically contiguous block first because
>>> * it is less likely to fragment multiple larger blocks and therefore
>>> * contribute to a long term fragmentation less than vmalloc fallback.
>>> - * However make sure that larger requests are not too disruptive - no
>>> - * OOM killer and no allocation failure warnings as we have a fallback.
>>> + * However make sure that larger requests are not too disruptive - i.e.
>>> + * do not direct reclaim unless physically continuous memory is preferred
>>> + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start
>>> + * working in the background but the allocation itself.
>>> */
>>> if (size > PAGE_SIZE) {
>>> flags |= __GFP_NOWARN;
>>>
>>> if (!(flags & __GFP_RETRY_MAYFAIL))
>>> - flags |= __GFP_NORETRY;
>>> + flags &= ~__GFP_DIRECT_RECLAIM;
>>>
>>> /* nofail semantic is implemented by the vmalloc fallback */
>>> flags &= ~__GFP_NOFAIL;
>>> --
>>> 2.49.0
>>>
>>
>> --
>> Michal Hocko
>> SUSE Labs
>