Re: [PATCH] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr

From: Valentin Schneider
Date: Sun Sep 15 2019 - 13:01:04 EST


On 15/09/2019 04:07, shikemeng wrote:
> From: <shikemeng@xxxxxxxxxx>
>
> reason: migration to invalid cpu in __set_cpus_allowed_ptr
> archive path: patches/euleros/sched
>
> Oops occur when running qemu on arm64:
> Unable to handle kernel paging request at virtual address ffff000008effe40
> Internal error: Oops: 96000007 [#1] SMP
> Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
> pstate: 20000085 (nzCv daIf -PAN -UAO)
> pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> lr : move_queued_task.isra.21+0x124/0x298
> ...
> Call trace:
> __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> __migrate_task+0xc8/0xe0
> migration_cpu_stop+0x170/0x180
> cpu_stopper_thread+0xec/0x178
> smpboot_thread_fn+0x1ac/0x1e8
> kthread+0x134/0x138
> ret_from_fork+0x10/0x18
>
> __set_cpus_allowed_ptr will choose an active dest_cpu in affinity mask to migrage the process if process is not
> currently running on any one of the CPUs specified in affinity mask.__set_cpus_allowed_ptr will choose an invalid
> dest_cpu(>= nr_cpu_ids, 1024 in my virtual machine) if CPUS in affinity mask are deactived by cpu_down after
> cpumask_intersects check.Cpumask_test_cpu of dest_cpu afterwards is overflow and may passes if corresponding bit
> is coincidentally set.As a consequence, kernel will access a invalid rq address associate with the invalid cpu in
> migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs. Process as follows may trigger the Oops:
> 1) A process repeatedly bind itself to cpu0 and cpu1 in turn by calling sched_setaffinity
> 2) A shell script repeatedly "echo 0 > /sys/devices/system/cpu/cpu1/online" and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn
> 3) Oops appears if the invalid cpu is set in memory after tested cpumask.
>
> Change-Id: I9c2f95aecd3da568991b7408397215f26c990e40
> Signed-off-by: <shikemeng@xxxxxxxxxx>

The log still isn't wrapped to 75 chars, and the change-id still hasn't been
removed.

The subject should also mention that this is v2 of the patch, again this is
all in the process documentation.

The fix itself looks fine though, so once the log respects the rules:
Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>