Re: [PATCH v2] sched: Store restrict_cpus_allowed_ptr() call state

From: Waiman Long
Date: Mon Jan 30 2023 - 12:34:49 EST


On 1/26/23 11:11, Will Deacon wrote:
On Tue, Jan 24, 2023 at 03:24:36PM -0500, Waiman Long wrote:
On 1/24/23 14:48, Will Deacon wrote:
On Fri, Jan 20, 2023 at 09:17:49PM -0500, Waiman Long wrote:
The user_cpus_ptr field was originally added by commit b90ca8badbd1
("sched: Introduce task_struct::user_cpus_ptr to track requested
affinity"). It was used only by arm64 arch due to possible asymmetric
CPU setup.

Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), task_struct::user_cpus_ptr is repurposed to store user
requested cpu affinity specified in the sched_setaffinity().

This results in a performance regression in an arm64 system when booted
with "allow_mismatched_32bit_el0" on the command-line. The arch code will
(amongst other things) calls force_compatible_cpus_allowed_ptr() and
relax_compatible_cpus_allowed_ptr() when exec()'ing a 32-bit or a 64-bit
task respectively. Now a call to relax_compatible_cpus_allowed_ptr()
will always result in a __sched_setaffinity() call whether there is a
previous force_compatible_cpus_allowed_ptr() call or not.
I'd argue it's more than just a performance regression -- the affinity
masks are set incorrectly, which is a user visible thing
(i.e. sched_getaffinity() gives unexpected values).
Can your elaborate a bit more on what you mean by getting unexpected
sched_getaffinity() results? You mean the result is wrong after a
relax_compatible_cpus_allowed_ptr(). Right?
Yes, as in the original report. If, on a 4-CPU system, I do the following
with v6.1 and "allow_mismatched_32bit_el0" on the kernel cmdline:

# for c in `seq 1 3`; do echo 0 > /sys/devices/system/cpu/cpu$c/online; done
# yes > /dev/null &
[1] 334
# taskset -p 334
pid 334's current affinity mask: 1
# for c in `seq 1 3`; do echo 1 > /sys/devices/system/cpu/cpu$c/online; done
# taskset -p 334
pid 334's current affinity mask: f

but with v6.2-rc5 that last taskset invocation gives:

pid 334's current affinity mask: 1

so, yes, the performance definitely regresses, but that's because the
affinity mask is wrong!

Are you using cgroup v1 or v2? Are your process in the root cgroup/cpuset or a child cgroup/cpuset under root?

If you are using cgroup v1 in a child cpuset, cpuset.cpus works more like cpuset.cpus_effective. IOW, with an offline and then online hotplug event, the cpu will be permanently lost from the cpuset. So the above is an expected result. If you using cgroup v2, the cpuset should be able to recover the lost cpu after the online event. If your process is in the root cpuset, the cpu will not be lost too. Alternatively, if you mount the v1 cpuset with the "cpuset_v2_mode" flag, it will behave more like v2 and recover the lost cpu.

I ran a similar cpu offline/online test with cgroup v1 and v2 and confirm that v1 has the above behavior and v2 is fine.

I believe what you have observed above may not be related to my sched patch as I can't see how it will affect cpu hotplug at all.

Cheers,
Longman