Re: [PATCH] sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity

From: Valentin Schneider
Date: Wed Nov 24 2021 - 13:06:51 EST


On 24/11/21 17:58, Vincent Donnefort wrote:
> On Wed, Nov 24, 2021 at 05:11:32PM +0000, Valentin Schneider wrote:
>> On 24/11/21 14:14, Vincent Donnefort wrote:
>> > A shortcut has been introduced in select_idle_sibling() to return prev_cpu
>> > if the wakee is woken up by a per-CPU kthread. This is an issue for
>> > asymmetric CPU capacity systems where the wakee might not fit prev_cpu
>> > anymore. Evaluate asym_fits_capacity() for prev_cpu before using that
>> > shortcut.
>> >
>> > Fixes: 52262ee567ad ("sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression")
>>
>> Shouldn't that rather be
>>
>> b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path")
>
> Yes definitely, my bad!
>
>>
>> ? This is an ulterior commit to the one you point to, and before then
>> asymmetric CPU systems wouldn't use any of the sis() heuristics.
>>
>> I reportedly reviewed said commit back then, and don't recall anything
>> specific about that conditional... The cover-letter for v2 states:
>>
>> https://lore.kernel.org/lkml/20201028174412.680-1-vincent.guittot@xxxxxxxxxx/
>> """
>> don't check capacity for the per-cpu kthread UC because the assumption is
>> that the wakee queued work for the per-cpu kthread that is now complete and
>> the task was already on this cpu.
>> """
>>
>> So the assumption here is that current is gonna sleep right after waking up
>> p, so current's utilization doesn't matter, and p was already on prev, so
>> it should fit there...
>
> I don't think the assumption that "p was already on prev should fit" is
> correct if we take into account uclamp min. That value can change from one
> activation to the other and make that task artificially too big for prev_cpu...
>

Humph, good point, hadn't thought of that.

>>
>> I'm thinking things should actually be OK with your other patch that
>> excludes 'current == swapper' from this condition.
>
> ...But indeed if we add [1] to the equation, this patch here would only
> protect against that specific corner case.
>
> (And probably also against the fact that this same task could have a value
> that doesn't fit this CPU anymore but didn't trigger misfit during its previous
> activation?)

That would imply crossing the misfit threshold right at the dequeue signal
update, but that can happen.