Re: [PATCH] lib/cpumask: Boot option to disable tasks distribution within cpumask

From: Phil Auld
Date: Thu May 02 2024 - 07:45:53 EST


Hi Peter,

On Thu, May 02, 2024 at 10:43:49AM +0200 Peter Zijlstra wrote:
> On Tue, Apr 30, 2024 at 11:23:07AM -0700, Yury Norov wrote:
> > On Tue, Apr 30, 2024 at 02:34:31PM +0530, Ankit Jain wrote:
> > > commit 46a87b3851f0 ("sched/core: Distribute tasks within affinity masks")
> > > and commit 14e292f8d453 ("sched,rt: Use cpumask_any*_distribute()")
> > > introduced the logic to distribute the tasks within cpumask upon initial
> > > wakeup.
> >
> > So let's add the authors in CC list?
> >
> > > For Telco RAN deployments, isolcpus are a necessity to cater to
> > > the requirement of low latency applications. These isolcpus are generally
> > > tickless so that high priority SCHED_FIFO tasks can execute without any
> > > OS jitter. Since load balancing is disabled on isocpus, any task
> > > which gets placed on these CPUs can not be migrated on its own.
> > > For RT applications to execute on isolcpus, a guaranteed kubernetes pod
> > > with all isolcpus becomes the requirement and these RT applications are
> > > affine to execute on a specific isolcpu within the kubernetes pod.
> > > However, there may be some non-RT tasks which could also schedule in the
> > > same kubernetes pod without being affine to any specific CPU(inherits the
> > > pod cpuset affinity).
> >
> > OK... It looks like adding scheduler maintainers is also a necessity to
> > cater here...
>
> So 14e292f8d453 is very specifically only using sched_domain_span(), and
> if you're using partitioned CPUs they should not show up there.
>
> As to 46a87b3851f0, if you're explicitly creating tasks with an affinity
> masks that spans your partition then you're getting what you ask for.

I think you are skipping some details. We've also asked for no load
balancing and this spreading is a form of load balancing. So that's
not getting what was asked for.

And the tasks being created with this affinity are not being explicitly
created thusly. It's implicit. Anything exec'd into the container
(==cgroup with cpuset set) gets the cpu affinity spanning the cpus in
the container. There are layers. It's not someone using "taskset bash"
at a command prompt.

>
> In fact, I already explained this to you earlier, so why are you
> suggesting horrible hacks again? This behaviour toggle you suggest is
> absolutely unacceptable.
>
> I even explained what the problem was and where to look for solutions.
>
> https://lkml.kernel.org/r/20231011135238.GG6337@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>

I was not aware of the history here. Thanks. I'll look into that. At
first blush it's not obvious how that helps but I've been wrong before...


Cheers,
Phil

--