Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2

From: Peter Zijlstra
Date: Thu May 31 2018 - 06:54:36 EST


On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:

> + cpuset.sched.load_balance
> + A read-write single value file which exists on non-root
> + cpuset-enabled cgroups. It is a binary value flag that accepts
> + either "0" (off) or "1" (on). This flag is set by the parent
> + and is not delegatable. It is on by default in the root cgroup.
> +
> + When it is on, tasks within this cpuset will be load-balanced
> + by the kernel scheduler. Tasks will be moved from CPUs with
> + high load to other CPUs within the same cpuset with less load
> + periodically.
> +
> + When it is off, there will be no load balancing among CPUs on
> + this cgroup. Tasks will stay in the CPUs they are running on
> + and will not be moved to other CPUs.

That is not entirely accurate I'm afraid (unless the patch makes it so,
I've yet to check). When you disable load-balancing on a cgroup you'll
get whatever balancing is left for the partition you happen to end up
in.

Take for instance workqueue thingies, they use kthread_bind_mask()
(IIRC) and thus end up with PF_NO_SETAFFINITY so cpusets (or any other
cgroups really) do not have effect on them (long standing complaint).

So take for instance the unbound numa enabled workqueue threads, those
will land in whatever partition and get balanced there.