Re: [PATCH v2] sched/numa: Introduce per cgroup numa balance control

From: Michal Koutný
Date: Wed Jun 25 2025 - 08:20:18 EST


On Wed, Jun 25, 2025 at 06:23:37PM +0800, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
> [Problem Statement]
> Currently, NUMA balancing is configured system-wide.
> However, in some production environments, different
> cgroups may have varying requirements for NUMA balancing.
> Some cgroups are CPU-intensive, while others are
> memory-intensive. Some do not benefit from NUMA balancing
> due to the overhead associated with VMA scanning, while
> others prefer NUMA balancing as it helps improve memory
> locality. In this case, system-wide NUMA balancing is
> usually disabled to avoid causing regressions.
>
> [Proposal]
> Introduce a per-cgroup interface to enable NUMA balancing
> for specific cgroups.

The balancing works with task granularity already and this new attribute
is not much of a resource to control.
Have you considered a per-task attribute? (sched_setattr(), prctl() or
similar) That one could be inherited and respective cgroups would be
seeded with a process with intended values. And cpuset could be
traditionally used to restrict the scope of balancing of such tasks.

WDYT?

> This interface is associated with the CPU subsystem, which
> does not support threaded subtrees, and close to CPU bandwidth
> control.
(??) does support

> The system administrator needs to set the NUMA balancing mode to
> NUMA_BALANCING_CGROUP=4 to enable this feature. When the system is in
> NUMA_BALANCING_CGROUP mode, NUMA balancing for all cgroups is disabled
> by default. After the administrator enables this feature for a
> specific cgroup, NUMA balancing for that cgroup is enabled.

How much dynamic do you such changes to be? In relation to given
cgroup's/process's lifecycle.

Thanks,
Michal

Attachment: signature.asc
Description: PGP signature