[RFC next v2 0/2] ucounts: turn the atomic rlimit to percpu_counter

From: Chen Ridong
Date: Mon May 19 2025 - 09:25:03 EST


From: Chen Ridong <chenridong@xxxxxxxxxx>

The will-it-scale test case signal1 [1] has been observed. and the test
results reveal that the signal sending system call lacks linearity.
To further investigate this issue, we initiated a series of tests by
launching varying numbers of dockers and closely monitored the throughput
of each individual docker. The detailed test outcomes are presented as
follows:

| Dockers |1 |4 |8 |16 |32 |64 |
| Throughput |380068 |353204 |308948 |306453 |180659 |129152 |

The data clearly demonstrates a discernible trend: as the quantity of
dockers increases, the throughput per container progressively declines.
In-depth analysis has identified the root cause of this performance
degradation. The ucouts module conducts statistics on rlimit, which
involves a significant number of atomic operations. These atomic
operations, when acting on the same variable, trigger a substantial number
of cache misses or remote accesses, ultimately resulting in a drop in
performance.

This patch set addresses scalability issues in the ucounts rlimit by
replacing atomic rlimit counters with percpu_counter, which distributes
counts across CPU cores to reduce cache contention under heavy load.

Patch 1 modifies thate ucount can be freed until both the refcount and
rlimit are fully released, minimizing redundant summations. Patch 2 turns
the atomic rlimit to percpu_counter, which is suggested by Andrew.

[1] https://github.com/antonblanchard/will-it-scale/blob/master/tests/

---
v2: use percpu_counter intead of cache rlimit.

v1: https://lore.kernel.org/lkml/20250509072054.148257-1-chenridong@xxxxxxxxxxxxxxx/

Chen Ridong (2):
ucounts: free ucount only count and rlimit are zero
ucounts: turn the atomic rlimit to percpu_counter

include/linux/user_namespace.h | 17 +++-
init/main.c | 1 +
ipc/mqueue.c | 6 +-
kernel/signal.c | 8 +-
kernel/ucount.c | 169 +++++++++++++++++++++++----------
mm/mlock.c | 5 +-
6 files changed, 138 insertions(+), 68 deletions(-)

--
2.34.1