Re: [PATCH] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr

From: Dietmar Eggemann
Date: Tue Sep 24 2019 - 10:10:07 EST


On 9/23/19 6:06 PM, Valentin Schneider wrote:
> On 23/09/2019 16:43, Dietmar Eggemann wrote:
>> I'm not sure that CONFIG_DEBUG_PER_CPU_MAPS=y will help you here.
>>
>> __set_cpus_allowed_ptr(...)
>> {
>> ...
>> dest_cpu = cpumask_any_and(...)
>> ...
>> }
>>
>> With:
>>
>> #define cpumask_any_and(mask1, mask2) cpumask_first_and((mask1), (mask2))
>> #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p),
>> (src2p))
>>
>> cpumask_next_and() is called with n = -1 and in this case does not
>> invoke cpumask_check().
>>
>
> It won't warn here because it's still a valid return value, but it should
> warn in the cpumask_test_cpu() that follows (in is_cpu_allowed()) because
> it would be passed a value >= nr_cpu_ids. So at the very least this config
> does catch cpumask_any*() return values being blindly passed to
> cpumask_test_cpu().

OK, I see and agree.

But IMHO, we still don't call cpumask_test_cpu(dest_cpu, ...), right.

What the patch fixes is that it closes the window between two reads of
cpu_active_mask in which cpuhp can potentially punch a hole into the
cpu_active_mask.

If p is not running or queued and it's state is unequal to TASK_WAKING,
a 'dest_cpu == nr_cpu_ids' goes unnoticed. Otherwise we see an 'unable
to handle kernel paging request' or 'unable to handle page fault for
address' bug in migration_cpu_stop() or move_queued_task().

Do I miss something?

[...]