Re: [PATCH] sched: fix rq lock recursion issue

From: Satya Durga Srinivasu Prabhala
Date: Tue Jul 05 2022 - 23:45:31 EST



On 7/1/22 4:48 AM, Qais Yousef wrote:
On 07/01/22 10:33, Peter Zijlstra wrote:
On Thu, Jun 30, 2022 at 10:53:10PM +0100, Qais Yousef wrote:
Hi Satya

On 06/24/22 00:42, Satya Durga Srinivasu Prabhala wrote:
Below recursion is observed in a rare scenario where __schedule()
takes rq lock, at around same time task's affinity is being changed,
bpf function for tracing sched_switch calls migrate_enabled(),
checks for affinity change (cpus_ptr != cpus_mask) lands into
__set_cpus_allowed_ptr which tries acquire rq lock and causing the
recursion bug.

Fix the issue by switching to preempt_enable/disable() for non-RT
Kernels.
Interesting bug. Thanks for the report. Unfortunately I can't see this being
a fix as it just limits the bug visibility to PREEMPT_RT kernels, but won't fix
anything, no? ie: Kernels compiled with PREEMPT_RT will still hit this failure.
Worse, there's !RT stuff that grew to rely on the preemptible migrate
disable stuff, so this actively breaks things.

I'm curious how the race with set affinity is happening. I would have thought
user space would get blocked as __schedule() will hold the rq lock.

Do you have more details on that?
Yeah, I'm not seeing how this works either, in order for
migrate_enable() to actually call __set_cpus_allowed_ptr(), it needs to
have done migrate_disable() *before* schedule, schedule() will then have
to call migrate_disable_swich(), and *then* migrate_enable() does this.

However, if things are nicely balanced (as they should be), then
trace_call_bpf() using migrate_disable()/migrate_enable() should never
hit this path.

If, OTOH, migrate_disable() was called prior to schedule() and we did do
migrate_disable_switch(), then it should be impossible for the
tracepoint/bpf stuff to reach p->migration_disabled == 0.
I think it's worth to confirm which kernel Satya is on too. If it's GKI, then
worth checking first this is actually reproducible on/applicable to mainline.
We are seeing the issue on 5.15 GKI Kernel. On older Kernels, like 5.10 Kernel
migrate_disable/enable() end-up calling preempt_disable/enable(). I will cross
check further on comments and inputs I received so far.


Cheers

--
Qais Yousef