答复: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.

From: Liuye
Date: Fri Mar 22 2024 - 03:52:38 EST


>On 21. 03. 24, 12:50, liu.yec@xxxxxxx wrote:
>> From: LiuYe <liu.yeC@xxxxxxx>
>>
>> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt
>> to use schedule_work() to provoke a keyboard reset when transitioning
>> out of the debugger and back to normal operation.
>> This can cause deadlock because schedule_work() is not NMI-safe.
>>
>> The stack trace below shows an example of the problem. In this case
>> the master cpu is not running from NMI but it has parked the slave
>> CPUs using an NMI and the parked CPUs is holding spinlocks needed by
>> schedule_work().
>
>I am missing here an explanation (perhaps because I cannot find any docs for irq_work) why irq_work works in this case.

Just need to postpone schedule_work to the slave CPU exiting the NMI context, and there will be no deadlock problem.
irq_work will only respond to handle schedule_work after master cpu exiting the current interrupt context.
When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock.

>And why you need to schedule another work in the irq_work and not do the job directly.

In the function kgdboc_restore_input_helper , use mutex_lock for protection. The mutex lock cannot be used in interrupt context.
Guess that the process needs to run in the context of the process. Therefore, call schedule_work in irq_work. Keep the original flow unchanged.