Re: [RFC PATCH v2 4/5] sched: UMCG: add a blocked worker list

From: Peter Zijlstra
Date: Mon Jan 17 2022 - 04:20:05 EST


On Thu, Jan 13, 2022 at 03:39:39PM -0800, Peter Oskolkov wrote:
> The original idea of a UMCG server was that it was used as a proxy
> for a CPU, so if a worker associated with the server is RUNNING,
> the server itself is never ever was allowed to be RUNNING as well;
> when umcg_wait() returned for a server, it meant that its worker
> became BLOCKED.
>
> In the new (old?) "per server runqueues" model implemented in
> the previous patch in this patchset, servers are woken when
> a previously blocked worker on their runqueue finishes its blocking
> operation, even if the currently RUNNING worker continues running.
>
> As now a server may run while a worker assigned to it is running,
> the original idea of having at most a single worker RUNNING per
> server, as a means to control the number of running workers, is
> not really enforced, and the server, woken by a worker
> doing BLOCKED=>RUNNABLE transition, may then call sys_umcg_wait()
> with a second/third/etc. worker to run.
>
> Support this scenario by adding a blocked worker list:
> when a worker transitions RUNNING=>BLOCKED, not only its server
> is woken, but the worker is also added to the blocked worker list
> of its server.
>
> This change introduces the following benefits:
> - block detection how behaves similarly to wake detection;
> without this patch worker wakeups added wakees to the list
> and woke the server, while worker blocks only woke the server
> without adding blocked workers to a list, forcing servers
> to explicitly check worker's state;
> - if the blocked worker woke sufficiently quickly, the server
> woken on the block event would observe its worker now as
> RUNNABLE, so the block event had to be inferred rather than
> explicitly signalled by the worker being added to the blocked
> worker list;
> - it is now possible for a single server to control several
> RUNNING workers, which makes writing userspace schedulers
> simpler for smaller processes that do not need to scale beyond
> one "server";
> - if the userspace wants to keep at most a single RUNNING worker
> per server, and have multiple servers with their own runqueues,
> this model is also naturally supported here.
>
> So this change basically decouples block/wake detection from
> M:N threading in the sense that the number of servers is now
> does not have to be M or N, but is more driven by the scalability
> needs of the userspace application.

So I don't object to having this blocking list, we had that early on in
the discussions.

*However*, combined with WF_CURRENT_CPU this 1:N userspace model doesn't
really make sense, also combined with Proxy-Exec (if we ever get that
sorted) it will fundamentally not work.

More consideration is needed I think...