Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup

From: Li, Aubrey
Date: Wed Nov 04 2020 - 06:53:29 EST


Hi Valentin,

Thanks for your reply.

On 2020/11/4 3:27, Valentin Schneider wrote:
>
> Hi,
>
> On 21/10/20 16:03, Aubrey Li wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6b3b59cc51d6..088d1995594f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6023,6 +6023,38 @@ void __update_idle_core(struct rq *rq)
>> rcu_read_unlock();
>> }
>>
>> +static DEFINE_PER_CPU(bool, cpu_idle_state);
>
> I would've expected this to be far less compact than a cpumask, but that's
> not the story readelf is telling me. Objdump tells me this is recouping
> some of the padding in .data..percpu, at least with the arm64 defconfig.
>
> In any case this ought to be better wrt cacheline bouncing, which I suppose
> is what we ultimately want here.

Yes, every CPU has a byte, so it may not be less than a cpumask. Probably I can
put it into struct rq, do you have any better suggestions?

>
> Also, see rambling about init value below.
>
>> @@ -10070,6 +10107,12 @@ static void nohz_balancer_kick(struct rq *rq)
>> if (unlikely(rq->idle_balance))
>> return;
>>
>> + /* The CPU is not in idle, update idle cpumask */
>> + if (unlikely(sched_idle_cpu(cpu))) {
>> + /* Allow SCHED_IDLE cpu as a wakeup target */
>> + update_idle_cpumask(rq, true);
>> + } else
>> + update_idle_cpumask(rq, false);
>
> This means that without CONFIG_NO_HZ_COMMON, a CPU going into idle will
> never be accounted as going out of it, right? Eventually the cpumask
> should end up full, which conceptually implements the previous behaviour of
> select_idle_cpu() but in a fairly roundabout way...

Maybe I can move it to scheduler_tick().

>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 9079d865a935..f14a6ef4de57 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1407,6 +1407,7 @@ sd_init(struct sched_domain_topology_level *tl,
>> sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
>> atomic_inc(&sd->shared->ref);
>> atomic_set(&sd->shared->nr_busy_cpus, sd_weight);
>> + cpumask_copy(sds_idle_cpus(sd->shared), sched_domain_span(sd));
>
> So at init you would have (single LLC for sake of simplicity):
>
> \all cpu : cpu_idle_state[cpu] == false
> cpumask_full(sds_idle_cpus) == true
>
> IOW it'll require all CPUs to go idle at some point for these two states to
> be properly aligned. Should cpu_idle_state not then be init'd to 1?
>
> This then happens again for hotplug, except that cpu_idle_state[cpu] may be
> either true or false when the sds_idle_cpus mask is reset to 1's.
>

okay, will refine this in the next version.

Thanks,
-Aubrey