Re: [PATCH v2 6/6] sched/deadline: Implement fallback mechanism for !fit case

From: Dietmar Eggemann
Date: Fri May 01 2020 - 12:12:19 EST


On 30/04/2020 13:00, Pavan Kondeti wrote:
> On Wed, Apr 29, 2020 at 07:39:50PM +0200, Dietmar Eggemann wrote:
>> On 27/04/2020 16:17, luca abeni wrote:

[...]

>>> On Mon, 27 Apr 2020 15:34:38 +0200
>>> Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:

[...]

>>>> On 27/04/20 10:37, Dietmar Eggemann wrote:
>>>>> From: Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx>

[...]

>>>>> - if (!cpumask_empty(later_mask))
>>>>> - return 1;
>>>>> + if (cpumask_empty(later_mask))
>>>>> + cpumask_set_cpu(max_cpu, later_mask);
>>>>
>>>> Think we touched upon this during v1 review, but I'm (still?)
>>>> wondering if we can do a little better, still considering only free
>>>> cpus.
>>>>
>>>> Can't we get into a situation that some of the (once free) big cpus
>>>> have been occupied by small tasks and now a big task enters the
>>>> system and it only finds small cpus available, were it could have fit
>>>> into bigs if small tasks were put onto small cpus?
>>>>
>>>> I.e., shouldn't we always try to best fit among free cpus?
>>>
>>> Yes; there was an additional patch that tried schedule each task on the
>>> slowest core where it can fit, to address this issue.
>>> But I think it will go in a second round of patches.
>>
>> Yes, we can run into this situation in DL, but also in CFS or RT.
>>
> In CFS case, the misfit task handling in load balancer should help pulling
> the BIG task running on the little CPUs. I get your point that we can run
> into the same scenario with other scheduling class tasks.

Yes, the CPU stopper (i.e. CFS's active load balance) can help here.
IMHO, using the CPU stopper in RT/DL for moving the running task (next
to using best fit rather than just fit CPU) is considered future work.
AFAICS, push/pull is not designed for migration of running tasks.

[...]

>> You did spot the rt-app 'delay' for the small tasks in the test case ;-)
>
> Thanks for the hint. It was not clear to me why 1 msec delay is given for
> the small tasks in the rt-app json description in the cover letter.
> I get it now :-)

So far Capacity awareness in RT/DL means that as long as there are CPUs
available which fit the task, use one of them. This is already
beneficial for a lot of use cases on CPU asymmetric systems since it
offers more predictable behavior.

I'll add a note to the cover letter in the next version about the reason
of the rt-app 'delay'.