Re: [PATCH v2 1/5] sched/fair: Reorder enqueue/dequeue_task_fair path

From: Dietmar Eggemann
Date: Thu Feb 20 2020 - 08:39:05 EST


On 19/02/2020 17:26, Vincent Guittot wrote:
> On Wed, 19 Feb 2020 at 12:07, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 18/02/2020 15:15, Vincent Guittot wrote:
>>> On Tue, 18 Feb 2020 at 14:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>>
>>>> On Tue, Feb 18, 2020 at 01:37:37PM +0100, Dietmar Eggemann wrote:
>>>>> On 14/02/2020 16:27, Vincent Guittot wrote:
>>>>>> The walk through the cgroup hierarchy during the enqueue/dequeue of a task
>>>>>> is split in 2 distinct parts for throttled cfs_rq without any added value
>>>>>> but making code less readable.
>>>>>>
>>>>>> Change the code ordering such that everything related to a cfs_rq
>>>>>> (throttled or not) will be done in the same loop.
>>>>>>
>>>>>> In addition, the same steps ordering is used when updating a cfs_rq:
>>>>>> - update_load_avg
>>>>>> - update_cfs_group
>>>>>> - update *h_nr_running
>>>>>
>>>>> Is this code change really necessary? You pay with two extra goto's. We
>>>>> still have the two for_each_sched_entity(se)'s because of 'if
>>>>> (se->on_rq); break;'.
>>>>
>>>> IIRC he relies on the presented ordering in patch #5 -- adding the
>>>> running_avg metric.
>>>
>>> Yes, that's the main reason, updating load_avg before h_nr_running
>>
>> My hunch is you refer to the new function:
>>
>> static inline void se_update_runnable(struct sched_entity *se)
>> {
>> if (!entity_is_task(se))
>> se->runnable_weight = se->my_q->h_nr_running;
>> }
>>
>> I don't see the dependency to the 'update_load_avg -> h_nr_running'
>> order since it operates on se->my_q, not cfs_rq = cfs_rq_of(se), i.e.
>> se->cfs_rq.
>>
>> What do I miss here?
>
> update_load_avg() updates both se and cfs_rq so if you update
> cfs_rq->h_nr_running before calling update_load_avg() like in the 2nd
> for_each_sched_entity, you will update cfs_rq runnable_avg for the
> past time slot with the new h_nr_running value instead of the previous
> value.

Ah, now I see:

update_load_avg()
update_cfs_rq_load_avg()
__update_load_avg_cfs_rq()
___update_load_sum(..., cfs_rq->h_nr_running, ...)

^^^^^^^^^^^^^^^^^^^^

Not really obvious IMHO, since the code is introduced only in 4/5.

Could you add a comment to this patch header?

I see you mentioned this dependency already in v1 discussion

https://lore.kernel.org/r/CAKfTPtAM=kgF7Fz-JKFY+s_k5KFirs-8Bub3s1Eqtq7P0NMa0w@xxxxxxxxxxxxxx

"... But the following patches make PELT using h_nr_running ...".

IMHO it would be helpful to have this explanation in the 1/5 patch
header so people stop wondering why this is necessary.