Re: [linus:master] [selftests/cgroup] 954bacce36: kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail

From: Michal Koutný
Date: Mon Aug 11 2025 - 17:33:08 EST


Hello.

On Thu, Aug 07, 2025 at 01:52:31PM +0800, kernel test robot <oliver.sang@xxxxxxxxx> wrote:
> dfe25fbaedfc2a07 954bacce36d976fe472090b5598
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail
> :6 100% 6:6 kernel-selftests.cgroup.test_cpu.test_cpucg_max_nested.fail
>
>
> not sure if there are any necessary env setting? thanks

The selftest commit essentially changed the tolerance margin from
ridiculously large to something that looked statistically appropriate
[1].
However, when I run the test (30x) on the 954bacce36 I get:

quantile([D1 D2 D8]) # 1 2 and 8 vCPUs respectively
ans =

1.3086e+04 1.1559e+04 1.1177e+04 # min
1.5109e+04 1.2936e+04 1.2989e+04
1.5791e+04 1.3938e+04 1.3788e+04 # median
1.6159e+04 1.5385e+04 1.4980e+04
1.8757e+04 1.8699e+04 1.9494e+04 # max

I obtain similar values also on v6.15 (the kernel + 954bacce36
selftest). So it's not anything in throtlling implementation affecting
this.

The tests above were with HZ=250, for HZ=1000, I get slightly smaller
results with D2:
1.1753e+04 # min
1.2634e+04
1.3208e+04 # median
1.4010e+04
1.6937e+04 # max

But still nowhere the 20% margin (i.e. values_close(...10%)), these
values would demand up to 100% (values_close(..., 50%)). Or add a bias
derived from sched_cfs_bandwidth_slice_us or increase the tested quota
from 1% to 5%, that'd be an improvement:

48882 # min
52450
52941 # median
54284 # 75th percentile
73186 # max (limit would be 60000)

I'm not sure how big overrun we want to accept as a pass.

Michal

[1] lore.kernel.org/r/20250701-kselftest-cgroup-fix-cpu-max-v1-2-049507ad6832@xxxxxxxx

Attachment: signature.asc
Description: PGP signature