Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback

From: Peter Zijlstra
Date: Mon Apr 12 2021 - 08:04:10 EST


On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> > @@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
> > set_cpu_active(cpu, false);
> >
> > /*
> > - * From this point forward, this CPU will refuse to run any task that
> > - * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > - * push those tasks away until this gets cleared, see
> > - * sched_cpu_dying().
> > - */
> > - balance_push_set(cpu, true);
> > -
> > - /*
> > * We've cleared cpu_active_mask / set balance_push, wait for all
> > * preempt-disabled and RCU users of this state to go away such that
> > * all new such users will observe it.
> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> > }
> > rq_unlock_irqrestore(rq, &rf);
> >
> > + /*
> > + * From this point forward, this CPU will refuse to run any task that
> > + * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > + * push those tasks away until this gets cleared, see
> > + * sched_cpu_dying().
> > + */
> > + balance_push_set(cpu, true);
> > +
>
> AIUI with cpu_dying_mask being flipped before even entering
> sched_cpu_deactivate(), we don't need this to be before the
> synchronize_rcu() anymore; is there more than that to why you're punting it
> back this side of it?

I think it does does need to be like this, we need to clearly separate
the active=true and balance_push_set(). If we were to somehow observe
both balance_push_set() and active==false, we'd be in trouble.

> > #ifdef CONFIG_SCHED_SMT
> > /*
> > * When going down, decrement the number of cores with SMT present.
>
> > @@ -8206,7 +8212,7 @@ void __init sched_init(void)
> > rq->sd = NULL;
> > rq->rd = NULL;
> > rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> > - rq->balance_callback = NULL;
> > + rq->balance_callback = &balance_push_callback;
> > rq->active_balance = 0;
> > rq->next_balance = jiffies;
> > rq->push_cpu = 0;
> > @@ -8253,6 +8259,7 @@ void __init sched_init(void)
> >
> > #ifdef CONFIG_SMP
> > idle_thread_set_boot_cpu();
> > + balance_push_set(smp_processor_id(), false);
> > #endif
> > init_sched_fair_class();
> >
>
> I don't get what these two changes do - the end result is the same as
> before, no?

Not quite; we have to make sure the state of the offline CPUs matches
that of a CPU that's been offlined. For consistency if nothing else, but
it's actually important for a point below.

> Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
> since balance_push gets numbed down by !cpu_dying. What about the other way
> around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
> now-dying CPU, and we'd need to re-install the balance_push callback.

This is in fact handled. Note how the previous point initialized the
offline CPU to have the push_callback installed.

Also note how balance_push() re-instates itself unconditionally.

So the thing is, we install the push callback on deactivate() and leave
it in place until activate, it is always there, regardless what way
we're moving.

However, it is only effective whild going down, see the early exit.