Re: [PATCH v3 0/5] mm/memcg: Reduce kmemcache memory accounting overhead

From: Waiman Long
Date: Thu Apr 15 2021 - 09:17:49 EST


On 4/14/21 11:26 PM, Masayoshi Mizuma wrote:

Hi Longman,

Thank you for your patches.
I rerun the benchmark with your patches, it seems that the reduction
is small... The total duration of sendto() and recvfrom() system call
during the benchmark are as follows.

- sendto
- v5.8 vanilla: 2576.056 msec (100%)
- v5.12-rc7 vanilla: 2988.911 msec (116%)
- v5.12-rc7 with your patches (1-5): 2984.307 msec (115%)

- recvfrom
- v5.8 vanilla: 2113.156 msec (100%)
- v5.12-rc7 vanilla: 2305.810 msec (109%)
- v5.12-rc7 with your patches (1-5): 2287.351 msec (108%)

kmem_cache_alloc()/kmem_cache_free() are called around 1,400,000 times during
the benchmark. I ran a loop in a kernel module as following. The duration
is reduced by your patches actually.

---
dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
for (i = 0; i < 1400000; i++) {
p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
kmem_cache_free(dummy_cache, p);
}
---

- v5.12-rc7 vanilla: 110 msec (100%)
- v5.12-rc7 with your patches (1-5): 85 msec (77%)

It seems that the reduction is small for the benchmark though...
Anyway, I can see your patches reduce the overhead.
Please feel free to add:

Tested-by: Masayoshi Mizuma <m.mizuma@xxxxxxxxxxxxxx>

Thanks!
Masa

Thanks for the testing.

I was focusing on your kernel module benchmark in testing my patch. I will try out your pgbench benchmark to see if there can be other tuning that can be done.

BTW, how many numa nodes does your test machine? I did my testing with a 2-socket system. The vmstat caching part may be less effective on systems with more numa nodes. I will try to find a larger 4-socket systems for testing.

Cheers,
Longman