Re: [PATCH] sched/fair: fix runnable_avg for throttled cfs

From: Dietmar Eggemann
Date: Thu Feb 27 2020 - 06:20:19 EST


On 26.02.20 21:01, Vincent Guittot wrote:
> On Wed, 26 Feb 2020 at 20:04, <bsegall@xxxxxxxxxx> wrote:
>>
>> Vincent Guittot <vincent.guittot@xxxxxxxxxx> writes:
>>
>>> When a cfs_rq is throttled, its group entity is dequeued and its running
>>> tasks are removed. We must update runnable_avg with current h_nr_running
>>> and update group_se->runnable_weight with new h_nr_running at each level

^^^

Shouldn't this be 'current' rather 'new' h_nr_running for
group_se->runnable_weight? IMHO, you want to cache the current value
before you add/subtract task_delta.

>>> of the hierarchy.
>>
>> You'll also need to do this for task enqueue/dequeue inside of a
>> throttled hierarchy, I'm pretty sure.
>
> AFAICT, this is already done with patch "sched/pelt: Add a new
> runnable average signal" when task is enqueued/dequeued inside a
> throttled hierarchy
>
>>
>>>
>>> Fixes: 9f68395333ad ("sched/pelt: Add a new runnable average signal")
>>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>>> ---
>>> This patch applies on top of tip/sched/core
>>>
>>> kernel/sched/fair.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index fcc968669aea..6d46974e9be7 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4703,6 +4703,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
>>>
>>> if (dequeue)
>>> dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
>>> + else {
>>> + update_load_avg(qcfs_rq, se, 0);
>>> + se_update_runnable(se);
>>> + }
>>> +
>>> qcfs_rq->h_nr_running -= task_delta;
>>> qcfs_rq->idle_h_nr_running -= idle_task_delta;
>>>
>>> @@ -4772,6 +4777,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>>> cfs_rq = cfs_rq_of(se);
>>> if (enqueue)
>>> enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>>> + else {
>>> + update_load_avg(cfs_rq, se, 0);
>>
>>
>>> + se_update_runnable(se);
>>> + }
>>> +
>>> cfs_rq->h_nr_running += task_delta;
>>> cfs_rq->idle_h_nr_running += idle_task_delta;