Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion

From: Valentin Schneider
Date: Thu Jan 28 2021 - 14:02:40 EST


On 29/01/21 01:02, Tao Zhou wrote:
> Hi,
>
> On Wed, Jan 27, 2021 at 07:30:35PM +0000, Valentin Schneider wrote:
>
>> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends
>> unearthed one more outstanding issue. This doesn't even involve
>> migrate_disable(), but rather affinity changes and execution of the stopper
>> racing with each other.
>>
>> My own interpretation of the (lengthy) TLA+ splat (note the potential for
>> errors at each level) is:
>>
>> Initial conditions:
>> victim.cpus_mask = {CPU0, CPU1}
>>
>> CPU0 CPU1 CPU<don't care>
>>
>> switch_to(victim)
>> set_cpus_allowed(victim, {CPU1})
>> kick CPU0 migration_cpu_stop({.dest_cpu = CPU1})
>> switch_to(stopper/0)
>> // e.g. CFS load balance
>> move_queued_task(CPU0, victim, CPU1);
> ^^^^^^^^^^^^^^^^
>
> Why is move_queued_task() not attach_task()/detach_task() for CFS load..
>

Heh I expected that one; it is indeed detach_task()/attach_task() for CFS
LB. I didn't want to make this any longer than it needed to, and I figured
that move_queued_task() being a composition of detach_task(), attach_task() and
rq_locks, this would get the point across.

This does raise an "interesting" point that ATM I think this issue cannot
actually involve move_queued_task(), since all current move_queued_task()
callsites are issued either from a stopper or from set_cpus_allowed_ptr().

CFS' detach_task() + attach_task() could do it, though.

>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 06b449942adf..b57326b0a742 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1923,20 +1923,28 @@ static int migration_cpu_stop(void *data)
>> complete = true;
>> }
>>
>> - /* migrate_enable() -- we must not race against SCA */
>> - if (dest_cpu < 0) {
>> - /*
>> - * When this was migrate_enable() but we no longer
>> - * have a @pending, a concurrent SCA 'fixed' things
>> - * and we should be valid again. Nothing to do.
>> - */
>> - if (!pending) {
>> - WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
>> - goto out;
>> - }
>> + /*
>> + * When this was migrate_enable() but we no longer
>> + * have a @pending, a concurrent SCA 'fixed' things
>> + * and we should be valid again.
>> + *
>> + * This can also be a stopper invocation that was 'fixed' by an
>> + * earlier one.
>> + *
>> + * Nothing to do.
>> + */
>> + if ((dest_cpu < 0 || dest_cpu == cpu_of(rq)) && !pending) {
>
> When the condition 'dest_cpu == cpu_of(rq)' is true, pending is not NULL.
> The condition may be like this:
>
> if ((dest_cpu < 0 && !pending) || dest_cpu == cpu_of(rq))
>
> We want to choose one cpu in the new(currently modified) cpu_mask and
> complete all.
>

Consider the execution of migration_cpu_stop() in above trace with
migrate_task_to(). We do have:
- dest_cpu == cpu_of(rq)
- p->migration_pending

but we do *not* want to bail out at this condition, because we need to fix
up dest_cpu.