Re: [PATCH v2] cgroup/rstat: change cgroup_base_stat to atomic

From: JP Kobryn
Date: Fri Jun 27 2025 - 12:56:13 EST


On 6/27/25 6:15 AM, Wlodarczyk, Bertrand wrote:
The kernel faces scalability issues when multiple userspace programs
attempt to read cgroup statistics concurrently.

The primary bottleneck is the css_cgroup_lock in cgroup_rstat_flush,
which prevents access and updates to the statistics of the css from
multiple CPUs in parallel.

Given that rstat operates on a per-CPU basis and only aggregates
statistics in the parent cgroup, there is no compelling reason why
these statistics cannot be atomic.
By eliminating the lock during CPU statistics access, each CPU can
traverse its rstat hierarchy independently, without blocking.
Synchronization is achieved during parent propagation through atomic
operations.

This change significantly enhances performance on commit
8dcb0ed834a3ec03 ("memcg: cgroup: call css_rstat_updated irrespective
of in_nmi()") in scenarios where multiple CPUs accessCPU rstat within
a single cgroup hierarchy, yielding a performance improvement of around 40 times.
Notably, performance for memory and I/O rstats remains unchanged, as
the lock remains in place for these usages.

Additionally, this patch addresses a race condition detectable in the
current mainline by KCSAN in __cgroup_account_cputime, which occurs
when attempting to read a single hierarchy from multiple CPUs.

Signed-off-by: Bertrand Wlodarczyk <bertrand.wlodarczyk@xxxxxxxxx>

This patch breaks memory controller as explained in the comments on the previous version.

Ekhm... no? I addressed the issue and v2 has lock back and surrounding the call to dependent submodules?
The behavior is the same as before patching.

In the long term, in my opinion, the atomics should happen also in dependent submodules to eliminate locks
completely.

> Also the response to the tearing issue explained by JP is not satisfying.

In other words, the claim is: "it's better to stall other cpus in spinlock plus disable IRQ every time in order to
serve outdated snapshot instead of providing user to the freshest statistics much, much faster".

But they're not really "outdated" are they? Regardless of the wait, once
the lock is acquired they will get the latest snapshot.