Re: [RFC PATCH v4 00/19] Core scheduling v4

From: Li, Aubrey
Date: Tue Mar 03 2020 - 09:59:33 EST


On 2020/2/29 7:55, Tim Chen wrote:
> On 2/26/20 1:54 PM, Vineeth Remanan Pillai wrote:
>
>> rq->curr being NULL can mean that the sibling is idle or forced idle.
>> In both the cases, I think it makes sense to migrate a task so that it can
>> compete with the other sibling for a chance to run. This function
>> can_migrate_task actually only says if this task is eligible and
>> later part of the code decides whether it is okay to migrate it
>> based on factors like load and util and capacity. So I think its
>> fine to declare the task as eligible if the dest core is running
>> idle. Does this thinking make sense?
>>
>> On our testing, it did not show much degradation in performance with
>> this change. I am reworking the fix by removing the check for
>> task_est_util. It doesn't seem to be valid to check for util to migrate
>> the task.
>>
>
> In Aaron's test case, there is a great imbalance in the load on one core
> where all the grp A tasks are vs the other cores where the grp B tasks are
> spread around. Normally, load balancer will move the tasks for grp A.
>
> Aubrey's can_migrate_task patch prevented the load balancer to migrate tasks if the core
> cookie on the target queue don't match. The thought was it will induce
> force idle and reduces cpu utilization if we migrate task to it.
> That kept all the grp A tasks from getting migrated and kept the imbalance
> indefinitely in Aaron's test case.
>
> Perhaps we should also look at the load imbalance between the src rq and
> target rq. If the imbalance is big (say two full cpu bound tasks worth
> of load), we should migrate anyway despite the cookie mismatch. We are willing
> to pay a bit for the force idle by balancing the load out more.
> I think Aubrey's patch on can_migrate_task should be more friendly to
> Aaron's test scenario if such logic is incorporated.
>
> In Vinnet's fix, we only look at the currently running task's weight in
> src and dst rq. Perhaps the load on the src and dst rq needs to be considered
> to prevent too great an imbalance between the run queues?

We are trying to migrate a task, can we just use cfs.h_nr_running? This signal
is used to find the busiest run queue as well.

Thanks,
-Aubrey