Re: [PATCH] sched/core: Test online status in available_idle_cpu()

From: Valentin Schneider
Date: Thu May 02 2024 - 11:58:06 EST


On 29/04/24 07:54, Sven Schnelle wrote:
> The current implementation of available_idle_cpu() doesn't test
> whether a possible cpu is offline. On s390 this dereferences a
> NULL pointer in arch_vcpu_is_preempted() because lowcore is not
> allocated for offline cpus. On x86, tracing also shows calls to
> available_idle_cpu() after a cpu is disabled, but it looks like
> this isn't causing any (obvious) issue. Nevertheless, add a check
> and return early if the cpu isn't online.
>
> Signed-off-by: Sven Schnelle <svens@xxxxxxxxxxxxx>


So most of the uses of that function is in wakeup task placement.
o find_idlest_cpu() works on the sched_domain spans, so shouldn't deal with
offline CPUs.
o select_idle_sibling() may issue an available_idle_cpu(prev) with an
offline previous, which would trigger your issue.

Currently, even if select_idle_sibling() picks an offline CPU, this will
get corrected by select_fallback_rq() at the end of
select_task_rq(). However, it would make sense to realize @prev isn't a
suitable pick before making it to the fallback machinery, in which case
your patch makes sense beyond just fixing s390.

Reviewed-by: Valentin Schneider <vschneid@xxxxxxxxxx>