Re: [RFC PATCH] mm, slub: change percpu partial accounting from objects to pages

From: Vlastimil Babka
Date: Wed Sep 15 2021 - 04:42:10 EST


On 9/15/21 07:32, David Rientjes wrote:
> On Mon, 13 Sep 2021, Vlastimil Babka wrote:
>
>> While this is no longer a problem in kmemcg context thanks to the accounting
>> rewrite in 5.9, the memory waste is still not ideal and it's questionable
>> whether it makes sense to perform free object count based control when object
>> counts can easily become so much inaccurate. So this patch converts the
>> accounting to be based on number of pages only (which is precise) and removes
>> the page->pobjects field completely. This is also ultimately simpler.
>>
>
> Thanks for the very detailed explanation, this is very timely for us.
>
> I'm wondering if we should be concerned about the memory waste even being
> possible, though, now that we have the kmemcg accounting change?
>
> IIUC, because we're accounting objects and not pages, then it *seems* like
> we could have a high number of pages but very few objects charged per
> page so this memory waste could go unconstrained from any kmemcg
> limitation.

So the main problem before 5.9 was that there were separate kmem caches per
memcg with their own percpu partial lists, so the memory used was determined
by caches x cpus x memcgs, now they are shared so it's just caches x cpus.
What you're saying would be also true, but relatively much smaller issue
than what it was before 5.9.

>> To retain the existing set_cpu_partial() heuristic, first calculate the target
>> number of objects as previously, but then convert it to target number of pages
>> by assuming the pages will be half-filled on average. This assumption might
>> obviously also be inaccurate in practice, but cannot degrade to actual number of
>> pages being equal to the target number of objects.
>>
>
> I think that's a fair heuristic.
>
>> We could also skip the intermediate step with target number of objects and
>> rewrite the heuristic in terms of pages. However we still have the sysfs file
>> cpu_partial which uses number of objects and could break existing users if it
>> suddenly becomes number of pages, so this patch doesn't do that.
>>
>> In practice, after this patch the heuristics limit the size of percpu partial
>> list up to 2 pages. In case of a reported regression (which would mean some
>> workload has benefited from the previous imprecise object based counting), we
>> can tune the heuristics to get a better compromise within the new scheme, while
>> still avoid the unexpectedly long percpu partial lists.
>>
>
> Curious if you've tried netperf TCP_RR with this change? This benchmark
> was the most significantly improved benchmark that I recall with the
> introduction of per-cpu partial slabs for SLUB. If there are any
> regressions to be introduced by such an approach, I'm willing to bet that
> it would be surfaced with that benchmark.

I'll try, thanks for the tip.