Re: [RFC PATCH v4 00/19] Core scheduling v4

From: Aubrey Li
Date: Wed Feb 26 2020 - 23:45:27 EST


On Thu, Feb 27, 2020 at 5:54 AM Vineeth Remanan Pillai
<vpillai@xxxxxxxxxxxxxxxx> wrote:
>
> Hi Aubrey,
>>
>>
>> Thanks for the quick fix. I guess this patch should work for Aaron's case
>> because it almost removed the cookie checking here. Some comments
>> below:
>>
>> + if (rq->core->core_forceidle)
>> + return true;
>>
>> We check cookie match during load balance to avoid two different
>> cookie tasks on the same core. If one cpu is forced idle, its sibling should
>> be running a non-idle task, not sure why we can return true directly here.
>> And if we need to check cookie, with this forced idle case, why it's special
>> to be picked up?
>
> The idea behind this was: If a core is forced idle, then we have a
> scenario where one task is running without a companion and one task
> is forced to be idle. So if this source task is of the same cookie
> as the forced idle task, then migrating this source task will offer a
> chance for the force idle task to run together with task in subsequent
> schedules. We don't have a framework to check the cookie of forced
> idle tasks and hence this if block is only partial fix.

To mitigate force idle, this helper is trying to prevent unmatched case, not
catch the matched case. If the task's cookie is matched with forced idle
cpu's sibling, it's able to pass the check later. Comparing to my original
intention, this fix increases unmatched case as it returns immediately and
we don't check cookie later.

>>
>>
>> + if (task_h_load(p) > task_h_load(rq->curr))
>> + return true;
>> + if (task_util_est(p) > task_util_est(rq->curr))
>> + return true;
>>
>> These two need to exclude rq->curr == rq->idle case?
>> otherwise idle balance always return true?
>
> rq->curr being NULL can mean that the sibling is idle or forced idle.
> In both the cases, I think it makes sense to migrate a task so that it can
> compete with the other sibling for a chance to run. This function
> can_migrate_task actually only says if this task is eligible and
> later part of the code decides whether it is okay to migrate it
> based on factors like load and util and capacity. So I think its
> fine to declare the task as eligible if the dest core is running
> idle. Does this thinking make sense?
>
The intention here is stopping migration not allowing migration.
task_h_load(idle) is zero, but idle_task's sibling CPU is running
a task with cookieA, the fix returns immediately when
task_h_load(task cookieB) > 0, this puts another unmatched case in.

> On our testing, it did not show much degradation in performance with
> this change. I am reworking the fix by removing the check for
> task_est_util. It doesn't seem to be valid to check for util to migrate
> the task.

I'm not sure the check for task_h_load is exactly wanted here either.
>From my understanding, the problem is because we stopped high
weight task migration because its cookie does not match with the target's,
can we check high load here instead of high weight?

Thanks,
-Aubrey