Re: [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing

From: Peter Zijlstra
Date: Tue Oct 06 2020 - 09:49:08 EST


On Tue, Oct 06, 2020 at 12:20:43PM +0100, Valentin Schneider wrote:
>
> On 05/10/20 15:57, Peter Zijlstra wrote:
> > In order to minimize the interference of migrate_disable() on lower
> > priority tasks, which can be deprived of runtime due to being stuck
> > below a higher priority task. Teach the RT/DL balancers to push away
> > these higher priority tasks when a lower priority task gets selected
> > to run on a freshly demoted CPU (pull).
> >
> > This adds migration interference to the higher priority task, but
> > restores bandwidth to system that would otherwise be irrevocably lost.
> > Without this it would be possible to have all tasks on the system
> > stuck on a single CPU, each task preempted in a migrate_disable()
> > section with a single high priority task running.
> >
> > This way we can still approximate running the M highest priority tasks
> > on the system.
> >
>
> Ah, so IIUC that's the important bit that makes it we can't just say go
> through the pushable_tasks list and skip migrate_disable() tasks.
>
> Once the highest-prio task exits its migrate_disable() region, your patch
> pushes it away. If we ended up with a single busy CPU, it'll spread the
> tasks around one migrate_enable() at a time.
>
> That time where the top task is migrate_disable() is still a crappy time,
> and as you pointed out earlier today if it is a genuine pcpu task then the
> whole thing is -EBORKED...
>
> An alternative I could see would be to prevent those piles from forming
> altogether, say by issuing a similar push_cpu_stop() on migrate_disable()
> if the next pushable task is already migrate_disable(); but that's a
> proactive approach whereas yours is reactive, so I'm pretty sure that's
> bound to perform worse.

I think it is always possible to form pileups. Just start enough tasks
such that newer, higher priority, tasks have to preempt existing tasks.

Also, we might not be able to place the task elsewhere, suppose we have
all our M CPUs filled with an RT task, then when the lowest priority
task has migrate_disable(), wake the highest priority task.

Per the SMP invariant, this new highest priority task must preempt the
lowest priority task currently running, otherwise we would not be
running the M highest prio tasks.

That's not to say it might not still be beneficial from trying to avoid
them, but we must assume a pilup will occur, therefore my focus was on
dealing with them as best we can first.