Re: [PATCH 0/9] sched: Migrate disable support

From: Dietmar Eggemann
Date: Tue Sep 29 2020 - 05:15:20 EST


On 25/09/2020 19:49, Valentin Schneider wrote:
>
> On 25/09/20 13:19, Valentin Schneider wrote:
>> On 25/09/20 12:58, Dietmar Eggemann wrote:
>>> With Valentin's print_rq() inspired test snippet I always see one of the
>>> RT user tasks as the second guy? BTW, it has to be RT tasks, never
>>> triggered with CFS tasks.
>>>
>>> [ 57.849268] CPU2 nr_running=2
>>> [ 57.852241] p=migration/2
>>> [ 57.854967] p=task0-0
>>
>> I can also trigger the BUG_ON() using the built-in locktorture module
>> (+enabling hotplug torture), and it happens very early on. I can't trigger
>> it under qemu sadly :/ Also, in my case it's always a kworker:
>>
>> [ 0.830462] CPU3 nr_running=2
>> [ 0.833443] p=migration/3
>> [ 0.836150] p=kworker/3:0
>>
>> I'm looking into what workqueue.c is doing about hotplug...
>
> So with
> - The pending migration fixup (20200925095615.GA2651@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
> - The workqueue set_cpus_allowed_ptr() change (from IRC)
> - The set_rq_offline() move + DL/RT pull && rq->online (also from IRC)
>
> my Juno survives rtmutex + hotplug locktorture, where it would previously
> explode < 1s after boot (mostly due to the workqueue thing).
>
> I stared a bit more at the rq_offline() + DL/RT bits and they look fine to
> me.
>
> The one thing I'm not entirely sure about is while you plugged the
> class->balance() hole, AIUI we might still get RT (DL?) pull callbacks
> enqueued - say if we just unthrottled an RT RQ and something changes the
> priority of one of the freshly-released tasks (user or rtmutex
> interaction), I don't see any stopgap preventing a pull from happening.
>
> I slapped the following on top of my kernel and it didn't die, although I'm
> not sure I'm correctly stressing this path. Perhaps we could limit that to
> the pull paths, since technically we're okay with pushing out of an !online
> RQ.
>
> ---
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 50aac5b6db26..00d1a7b85e97 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1403,7 +1403,7 @@ queue_balance_callback(struct rq *rq,
> {
> lockdep_assert_held(&rq->lock);
>
> - if (unlikely(head->next))
> + if (unlikely(head->next || !rq->online))
> return;
>
> head->func = (void (*)(struct callback_head *))func;
> ---

When I use the original patch-set (i.e. without the pending migration
fixup and the two changes from IRC) it looks like that the rt task is
already on the rq before the rq_offline_rt() -> __disable_runtime() call.

pr_crit("CPU%d X: %d %d %lu %lu %d %d %d %llu %llu\n",
cpu_of(rq), rq->nr_running,
rt_rq->rt_nr_running, rt_rq->rt_nr_migratory,
rt_rq->rt_nr_total, rt_rq->overloaded,
rt_rq->rt_queued, rt_rq->rt_throttled,
rt_rq->rt_time, rt_rq->rt_runtime);

X = 1 : in rq_offline_rt() before __disable_runtime()
2 : " after "
3 : in rq_online_rt() before __enable_runtime()
4 : " after "
5 : in sched_cpu_dying() if (rq->nr_running > 1)

*[ 70.369719] CPU0 1: 1 1 1 1 0 1 0 36093160 950000000
*[ 70.374689] CPU0 2: 1 1 1 1 0 1 0 36093160 18446744073709551615
*[ 70.380615] CPU0 3: 1 1 1 1 0 1 0 36093160 18446744073709551615
*[ 70.386540] CPU0 4: 1 1 1 1 0 1 0 0 950000000
[ 70.395637] CPU1 1: 1 0 0 0 0 0 0 31033300 950000000
[ 70.400606] CPU1 2: 1 0 0 0 0 0 0 31033300 18446744073709551615
[ 70.406532] CPU1 3: 1 0 0 0 0 0 0 31033300 18446744073709551615
[ 70.412457] CPU1 4: 1 0 0 0 0 0 0 0 950000000
[ 70.421609] CPU4 1: 0 0 0 0 0 0 0 19397300 950000000
[ 70.426577] CPU4 2: 0 0 0 0 0 0 0 19397300 18446744073709551615
[ 70.432503] CPU4 3: 0 0 0 0 0 0 0 19397300 18446744073709551615
[ 70.438428] CPU4 4: 0 0 0 0 0 0 0 0 950000000
[ 70.484133] CPU3 3: 2 0 0 0 0 0 0 3907020 18446744073709551615
[ 70.489984] CPU3 4: 2 0 0 0 0 0 0 0 950000000
[ 70.540112] CPU2 3: 1 0 0 0 0 0 0 3605180 18446744073709551615
[ 70.545953] CPU2 4: 1 0 0 0 0 0 0 0 950000000
*[ 70.647548] CPU0 1: 2 1 1 1 0 1 0 5150760 950000000
*[ 70.652441] CPU0 2: 2 1 1 1 0 1 0 5150760 18446744073709551615
*[ 70.658281] CPU0 nr_running=2
[ 70.661255] p=migration/0
[ 70.664022] p=task0-4
*[ 70.666384] CPU0 5: 2 1 1 1 0 1 0 5150760 18446744073709551615
[ 70.672230] ------------[ cut here ]------------
[ 70.676850] kernel BUG at kernel/sched/core.c:7346!
[ 70.681733] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 70.687223] Modules linked in:
[ 70.690284] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-
rc1-00134-g7104613975b6-dirty #173
[ 70.699168] Hardware name: ARM Juno development board (r0) (DT)
[ 70.705107] Stopper: multi_cpu_stop+0x0/0x170 <- 0x0
[ 70.710078] pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
[ 70.715661] pc : sched_cpu_dying+0x210/0x250
[ 70.719936] lr : sched_cpu_dying+0x204/0x250
[ 70.724207] sp : ffff800011e7bc60
[ 70.727521] x29: ffff800011e7bc80 x28: 0000000000000002
[ 70.732840] x27: 0000000000000000 x26: ffff800011ab30c0
[ 70.738159] x25: ffff8000112d37e0 x24: ffff800011ab30c0
[ 70.743477] x23: ffff800011ab3440 x22: ffff000975e40790
[ 70.748796] x21: 0000000000000080 x20: 0000000000000000
[ 70.754115] x19: ffff00097ef591c0 x18: 0000000000000010
[ 70.759433] x17: 0000000000000000 x16: 0000000000000000
[ 70.764752] x15: ffff000975cf2108 x14: ffffffffffffffff
[ 70.770070] x13: ffff800091e7b9e7 x12: ffff800011e7b9ef
[ 70.775388] x11: ffff800011ac2000 x10: ffff800011ce86d0
[ 70.780707] x9 : 0000000000000001 x8 : ffff800011ce9000
[ 70.786026] x7 : ffff8000106edad8 x6 : 000000000000131c
[ 70.791344] x5 : ffff00097ef4f230 x4 : 0000000000000000
[ 70.796662] x3 : 0000000000000027 x2 : 414431aad459c700
[ 70.801981] x1 : 0000000000000000 x0 : 0000000000000002
[ 70.807299] Call trace:
[ 70.809747] sched_cpu_dying+0x210/0x250
[ 70.813676] cpuhp_invoke_callback+0x88/0x210
[ 70.818038] take_cpu_down+0x7c/0xd8
[ 70.821617] multi_cpu_stop+0xac/0x170
[ 70.825369] cpu_stopper_thread+0x98/0x130
[ 70.829469] smpboot_thread_fn+0x1c4/0x280
[ 70.833570] kthread+0x140/0x160
[ 70.836801] ret_from_fork+0x10/0x34
[ 70.840384] Code: 94011fd8 b9400660 7100041f 54000040 (d4210000)
[ 70.846487] ---[ end trace 7eb0e0efe803dcfe ]---
[ 70.851109] note: migration/0[11] exited with preempt_count 3