Re: [PATCH] sched: fix rq lock recursion issue

From: Satya Durga Srinivasu Prabhala
Date: Tue Jul 05 2022 - 23:42:37 EST



On 7/4/22 1:32 AM, Dietmar Eggemann wrote:
On 01/07/2022 13:48, Qais Yousef wrote:
On 07/01/22 10:33, Peter Zijlstra wrote:
On Thu, Jun 30, 2022 at 10:53:10PM +0100, Qais Yousef wrote:
Hi Satya

On 06/24/22 00:42, Satya Durga Srinivasu Prabhala wrote:
Below recursion is observed in a rare scenario where __schedule()
takes rq lock, at around same time task's affinity is being changed,
bpf function for tracing sched_switch calls migrate_enabled(),
checks for affinity change (cpus_ptr != cpus_mask) lands into
__set_cpus_allowed_ptr which tries acquire rq lock and causing the
recursion bug.

Fix the issue by switching to preempt_enable/disable() for non-RT
Kernels.
Interesting bug. Thanks for the report. Unfortunately I can't see this being
a fix as it just limits the bug visibility to PREEMPT_RT kernels, but won't fix
anything, no? ie: Kernels compiled with PREEMPT_RT will still hit this failure.
Worse, there's !RT stuff that grew to rely on the preemptible migrate
disable stuff, so this actively breaks things.

I'm curious how the race with set affinity is happening. I would have thought
user space would get blocked as __schedule() will hold the rq lock.

Do you have more details on that?
Yeah, I'm not seeing how this works either, in order for
migrate_enable() to actually call __set_cpus_allowed_ptr(), it needs to
have done migrate_disable() *before* schedule, schedule() will then have
to call migrate_disable_swich(), and *then* migrate_enable() does this.

However, if things are nicely balanced (as they should be), then
trace_call_bpf() using migrate_disable()/migrate_enable() should never
hit this path.

If, OTOH, migrate_disable() was called prior to schedule() and we did do
migrate_disable_switch(), then it should be impossible for the
tracepoint/bpf stuff to reach p->migration_disabled == 0.
I think it's worth to confirm which kernel Satya is on too. If it's GKI, then
worth checking first this is actually reproducible on/applicable to mainline.
Satya, do you still have these lines from your spin_dump() output showing
current, the kernel version and the hardware? Or a way to recreate this?
I couldn't provoke it so far.

...
[ 212.196452] BUG: spinlock recursion on CPU#4, bpftrace/1662
^^^^^^^^^^^^^
[ 212.196473] lock: 0xffff00097ef7f500, .magic: dead4ead, .owner: bpftrace/1662, .owner_cpu: 4
[ 212.196500] CPU: 4 PID: 1662 Comm: bpftrace Not tainted 5.19.0-rc2-00018-gb7ce5b6b4622-dirty #96
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[ 212.196513] Hardware name: ARM Juno development board (r0) (DT)
^^^^^^^^^^^^^^^^^^^^^^^^^^
[ 212.196520] Call trace:
...
Thanks for cross checking. Below are the output lines from spin_dump. I'm on 5.15 GKI Kernel.

[ 7447.326924] BUG: spinlock recursion on CPU#6, kworker/6:9/738
[ 7447.333615] lock: 0xffffff89321d8600, .magic: dead4ead, .owner: kworker/6:9/738, .owner_cpu: 6

I'm trying to get the issue reproduced with some additional debug logs, didn't find any
easy way so far. Still cross checking.