Re: [PATCH] sched/fair: Don't balance migration disabled tasks

From: Yicong Yang
Date: Thu Mar 16 2023 - 05:14:41 EST


Hi Valentin,

On 2023/3/15 23:34, Valentin Schneider wrote:
> On 13/03/23 14:57, Yicong Yang wrote:
>> kernel/sched/fair.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7a1b1f855b96..8fe767362d22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8433,6 +8433,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>> if (kthread_is_per_cpu(p))
>> return 0;
>>
>> + /* Migration disabled tasks need to be kept on their running CPU. */
>> + if (is_migration_disabled(p))
>> + return 0;
>> +
>> if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>> int cpu;
>
> That cpumask check should cover migration_disabled tasks, unless they
> haven't gone through migrate_disable_switch() yet
> (p->migration_disabled == 1, but the cpus_ptr hasn't been touched yet).
>
> Now, if that's the case, the task has to be src_rq's current (since it
> hasn't switched out), which means can_migrate_task() should exit via:
>
> if (task_on_cpu(env->src_rq, p)) {
> schedstat_inc(p->stats.nr_failed_migrations_running);
> return 0;
> }
>
> and thus not try to detach_task(). With that in mind, I don't get how your
> splat can happen, nor how the change change can help (a remote task p could
> execute migrate_disable() concurrently with can_migrate_task(p)).
>

I see, for migrate disabled tasks, if !p->on_cpu the migration can be avoid by
the cpus_ptr check and if p->on_cpu migration can be avoid by the task_on_cpu()
check. So this patch won't help.

> I'm a bit confused here, detach_tasks() happens entirely with src_rq
> rq_lock held, so there shouldn't be any surprises.
>

Since it's a arm64 machine, could the WARN_ON_ONCE() test be false positive?
I mean the update of p->migration_disabled is not observed by the balance
CPU and trigger this warning incorrectly.

> Can you share any extra context? E.g. exact HEAD of your tree, maybe the
> migrate_disable task in question if you have that info.
>

We met this on our internal version based on 6.1-rc4, the scheduler is not
patched. We also saw this in previous version like 6.0. This patch is applied
since 6.2 so we haven't seen this recently, but as you point out this patch doesn't
solve the problem. The questioned tasks are some systemd services(udevd, etc)
and I assume the migration disable is caused by seccomp:

seccomp()
...
bpf_prog_run_pin_on_cpu()
migrate_disable()
...
migrate_enable()

Thanks,
Yicong