Re: [PATCH V2] sched: Prevent balance_push() on remote runqueues

From: Tao Zhou
Date: Sat Aug 28 2021 - 13:17:05 EST


Hi Thomas,

On Sat, Aug 28, 2021 at 03:55:52PM +0200, Thomas Gleixner wrote:

> sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
> callback after changing priorities or the scheduling class of a task. The
> run-queue for which the callback is invoked can be local or remote.

Let me say(always with my wrong). When the priority or policy changed, it is
the opportunity to call RT/DL/core_sched callback if they queued.

The corresponding callback is: pull/push_rt_task(), pull/push_dl_task() and
sched_core_balance(). The rq's callback list can be queued from local or
other cpu's callback_head. They defined as per-cpu.

sched_core_balance() do not use stop machine. pull_dl_task(), push/pull_rt_task
use stop machine(push_cpu_stop()) to migrate tasks. SCA is another place to
use stop machine(push_cpu_stop()).

> That's not a problem for the regular rq::push_work which is serialized with
> a busy flag in the run-queue struct, but for the balance_push() work which

So, they use rq::push_busy to serialize as said above.

balance_push() is another call back func that may queue on rq's callback list.
But, it is different from the above ones. It is *global* and more import than
others. If it is already queued on, every other callback will not be queued.
And the points where calling __balance_callbacks() will do the cpu outgoing
things.

sched_setscheduler() and rt_mutex_setprio() are the two points to call
__balance_callbacks(), and the __schedule() also have two points. one
is finish_lock_switch() and another is when prev==next case.

Task switched implys priority or policy changed. More important is for
stop machine(__balance_push_cpu_stop()) to migrate the tasks..

> is only valid to be invoked on the outgoing CPU that's wrong. It not only

When deactive cpu, set balance_push() call back. The opportunity that
call this func may from other cpus(now see, so the saying warning).

> triggers the debug warning, but also leaves the per CPU variable push_work
> unprotected, which can result in double enqueues on the stop machine list.

Auh, only just use the outgoing cpu to do the stop->task->stop.. migrate tasks..
And, no need to use what to protect the push_work.

Yeah.. my ugly words.. But.

> Remove the warning and validate that the function is invoked on the
> outgoing CPU.
>
> Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
> Reported-by: Sebastian Siewior <bigeasy@xxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Ben Segall <bsegall@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> ---
> V2: Use the correct check for the outgoing CPU
> ---
> kernel/sched/core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8435,7 +8435,6 @@ static void balance_push(struct rq *rq)
> struct task_struct *push_task = rq->curr;
>
> lockdep_assert_rq_held(rq);
> - SCHED_WARN_ON(rq->cpu != smp_processor_id());
>
> /*
> * Ensure the thing is persistent until balance_push_set(.on = false);
> @@ -8443,9 +8442,10 @@ static void balance_push(struct rq *rq)
> rq->balance_callback = &balance_push_callback;
>
> /*
> - * Only active while going offline.
> + * Only active while going offline and when invoked on the outgoing
> + * CPU.
> */
> - if (!cpu_dying(rq->cpu))
> + if (!cpu_dying(rq->cpu) || rq != this_rq())
> return;
>
> /*



Thanks,
Tao