Re: sched_core_balance() releasing interrupts with pi_lock held

From: Dietmar Eggemann
Date: Tue Apr 05 2022 - 19:18:48 EST


On 05/04/2022 09:48, Peter Zijlstra wrote:
> On Mon, Apr 04, 2022 at 04:17:54PM -0400, T.J. Alumbaugh wrote:
>>
>> On 3/29/22 17:22, Steven Rostedt wrote:
>>> On Mon, 21 Mar 2022 13:30:37 -0400
>>> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>>
>>>> On Wed, 16 Mar 2022 22:03:41 +0100
>>>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>>
>>>>> Does something like the below (untested in the extreme) help?
>>>> Hi Peter,
>>>>
>>>> This has been tested extensively by the ChromeOS team and said that it does
>>>> appear to fix the problem.
>>>>
>>>> Could you get this into mainline, and tag it for stable so that it can be
>>>> backported to the appropriate stable releases?
>>>>
>>>> Thanks for the fix!
>>>>
>>> Hi Peter,
>>>
>>> I just don't want you to forget about this :-)
>>>
>>> -- Steve
>>>
>> Hi Peter,
>>
>> Just a note that if/when you send this out as a patch, feel free to add:
>>
>> Tested-by: T.J. Alumbaugh <talumbau@xxxxxxxxxxxx>
>
> https://lkml.kernel.org/r/20220330160535.GN8939@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I still wonder if this issue happened on a system w/o:

565790d28b1e ("sched: Fix balance_callback()")

Maybe chromeos-5.10 or earlier? In this case applying 565790d28b1e could
fix it as well.

The reason why I think the original issue happened on a system w/o
565790d28b1e is the call-stack in:

https://lkml.kernel.org/r/20220315174606.02959816@xxxxxxxxxxxxxxxxxx

[56064.673346] Call Trace:
[56064.676066] dump_stack+0xb9/0x117
[56064.679861] ? print_usage_bug+0x2af/0x2c2
[56064.684434] mark_lock_irq+0x25e/0x27d
[56064.688618] mark_lock+0x11a/0x16c
[56064.692412] mark_held_locks+0x57/0x87
[56064.696595] ? _raw_spin_unlock_irq+0x2c/0x40
[56064.701460] lockdep_hardirqs_on+0xb1/0x19d
[56064.706130] _raw_spin_unlock_irq+0x2c/0x40
[56064.710799] sched_core_balance+0x8a/0x4af
[56064.715369] ? __balance_callback+0x1f/0x9a <--- !!!
[56064.720030] __balance_callback+0x4f/0x9a
[56064.724506] rt_mutex_setprio+0x43a/0x48b
[56064.728982] task_blocks_on_rt_mutex+0x14d/0x1d5

has __balance_callback().

565790d28b1e changes __balance_callback() to __balance_callbacks()
^