Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

From: Konstantin Khlebnikov
Date: Thu Feb 13 2020 - 08:05:23 EST




On 12/02/2020 17.54, Petr Mladek wrote:
On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
On 11/02/2020 11.14, Kirill Tkhai wrote:
Hi, Konstantin,

On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.

Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.

This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
---
include/linux/nmi.h | 1 +
kernel/watchdog.c | 22 ++++++++++++++++++++++
kernel/watchdog_hld.c | 1 +
3 files changed, 24 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..8406df72ae5a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
#if defined(CONFIG_HARDLOCKUP_DETECTOR)
extern void hardlockup_detector_disable(void);
extern unsigned int hardlockup_panic;
+extern atomic_t hardlockup_detected;
#else
static inline void hardlockup_detector_disable(void) {}
#endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b6b1f54a7837..9f5c68fababe 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
# endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)
+{
+ static atomic_t flushed = ATOMIC_INIT(0);
+
+ /* flush messages from hard lockup detector */
+ if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+ atomic_set(&flushed, atomic_read(&hardlockup_detected));
+ printk_safe_flush();
+ }
+}

Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?

I don't think anybody could notice read-only access to second variable.
This executes once in several seconds.

Watchdogs already use same pattern (monotonic counter + snapshot) in
couple places. So code looks more clean in this way.

It is not only about speed. It is also about code complexity
and correctness. Using two variables is more complex.
Calling atomic_read(&hardlockup_detected) twice does > not look like a correct pattern.

It guarantees "at least once" which is enough for this case.


The single variable patter is used for similar things there
as well, for example, see hard_watchdog_warn,

hardlockup_allcpu_dumped.

Ouch. This works only once and there is no way to rearm it back.
Now I see why this thing never worked for me recent years =)

Maybe it's better reset sysctl_hardlockup_all_cpu_backtrace
to let user set sysctl back to 1.
Or rearm it back in softlockup watchdog after timeout.


I would call the variable "hardlockup_dump_flush" and
use the same logic as for hardlockup_allcpu_dumped.

Note that simple READ_ONCE(), WRITE_ONCE() are not enough
because they do not provide smp barriers.

It's hard to imagine arch which actually needs barrires here.
Softlockup timers will eventually see the change.


Best Regards,
Petr