Re: [RESEND PATCH] sched: sd_llc_id initialized

From: Valentin Schneider
Date: Wed Feb 15 2023 - 13:11:54 EST


On 14/02/23 17:54, Sun Shouxin wrote:
> In my test,I use isolcpus to isolate cpu for specific,
> and then I noticed different scenario when core binding.
>
> For example, the NUMA topology is as follows,
> NUMA node0 CPU(s): 0-15,32-47
> NUMA node1 CPU(s): 16-31,48-63
>
> and the 'isolcpus' is as follows,
> isolcpus=14,15,30,31,46,47,62,63
>
> One task initially running on the non-isolated core belong to NUMA0
> was bind to one isolated core on NUMA1, and then change its cpu affinity
> to all cores, I notice the task can be scheduled back to the
> non-isolated core on NUMA0.
>
> 1.taskset -pc 0-13 3512 (task running on core 1)
> 2.taskset -pc 63 3512 (task running on isolated core 63)
> 3.taskset -pc 0-63 3512 (task running on core 1)
>

This is working as intended, no?

> Another case, one task initially running on the non-isolated core
> belong to NUMA1 was bind to one isolated core on NUMA1,
> and then change its cpu affinity to all cores,
> the task can not be scheduled out and always run on the isolated core.
>
> 1.taskset -pc 16-29 3512 (task running on core 17)
> 2.taskset -pc 63 3512 (task running on isolated core 63)
> 3.taskset -pc 0-63 3512 (task still running on core 63
> and not schedule out)
>

And this is also not wrong, since CPU63 is in the task's affinity mask.

That said, I can see that in this case we'd want the task to use other CPUs
if it makes sense wrt load balance.

However, since CPU63 is attached to a NULL sched_domain, AFAIA your
solution is at the mercy of the @prev and @target CPUs passed to
select_idle_sibling(). So this might only work if the waker is on a
non-isolated CPU.

I don't think your patch is wrong, but I don't think it entirely fixes the
issue either. Unfortunately, due to isolated CPUs being attached to NULL
sched_domains, there isn't a magic solution as the majority of scheduler
decisions are based on these.

A safe bet would be to exclude isolated CPUs from the affinity of your
non-critical tasks. Things like TuneD [1] and/or cpusets could help.

[1]: https://github.com/redhat-performance/tuned