[RFC PATCH] panic: fix deadlock in panic()

From: Cheng Jian
Date: Wed Jun 03 2020 - 10:19:30 EST


A deadlock caused by logbuf_lock occurs when panic:

a) Panic CPU is running in non-NMI context
b) Panic CPU sends out shutdown IPI via NMI vector
c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
d) Panic CPU try to hold logbuf_lock, then deadlock occurs.

we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
to avoid deadlock, but it does not work here, because :

Firstly, it is inappropriate to check num_online_cpus() here.
When the CPU bring down via NMI vector, the panic CPU willn't
wait too long for other cores to stop, so when this problem
occurs, num_online_cpus() may be greater than 1.

Secondly, printk_safe_flush_on_panic() is called after panic
notifier callback, so if printk() is called in panic notifier
callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
is set, we print some debug information, it will try to hold the
logbuf_lock.

To avoid this deadlock, drop the num_online_cpus() check and call
the printk_safe_flush_on_panic() before panic_notifier_list callback,
attempt to re-init logbuf_lock from panic CPU.

Signed-off-by: Cheng Jian <cj.chengjian@xxxxxxxxxx>
---
kernel/panic.c | 3 +++
kernel/printk/printk_safe.c | 3 ---
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index b69ee9e76cb2..8dbcb2227b60 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -255,6 +255,9 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
}

+ /* Call flush even twice. It tries harder with a single online CPU */
+ printk_safe_flush_on_panic();
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..9ebc1723e1a4 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -269,9 +269,6 @@ void printk_safe_flush_on_panic(void)
* Do not risk a double release when more CPUs are up.
*/
if (raw_spin_is_locked(&logbuf_lock)) {
- if (num_online_cpus() > 1)
- return;
-
debug_locks_off();
raw_spin_lock_init(&logbuf_lock);
}
--
2.17.1