Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems setting option
From: Zhongkun He
Date: Wed Jun 18 2025 - 23:50:47 EST
On Wed, Jun 18, 2025 at 5:05 PM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> On Wed, Jun 18, 2025 at 10:46:02AM +0800, Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> > It is unnecessary to adjust memory affinity periodically from userspace,
> > as it is a costly operation.
>
> It'd be always costly when there's lots of data to migrate.
>
> > Instead, we need to shrink cpuset.mems to explicitly specify the NUMA
> > node from which newly allocated pages should come, and migrate the
> > pages once in userspace slowly or adjusted by numa balance.
>
> IIUC, the issue is that there's no set_mempolicy(2) for 3rd party
> threads (it only operates on current) OR that the migration path should
> be optimized to avoid those latencies -- do you know what is the
> contention point?
Hi Michal
In our scenario, when we shrink the allowed cpuset.mems —for example,
from nodes 1, 2, 3 to just nodes 2,3—there may still be a large number of pages
residing on node 1. Currently, modifying cpuset.mems triggers synchronous memory
migration, which results in prolonged and unacceptable service downtime under
cgroup v2. This behavior has become a major blocker for us in adopting
cgroup v2.
Tejun suggested adding an interface to control the migration rate, and
I plan to try
that later. However, we believe that the cpuset.migrate interface in
cgroup v1 is also
sufficient for our use case and is easier to work with. :)
Thanks,
Zhongkun
>
> Thanks,
> Michal