Re: [Bug 199003] console stalled, cause Hard LOCKUP.

From: Sergey Senozhatsky
Date: Wed Mar 21 2018 - 03:29:06 EST


On (03/20/18 09:34), bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
[..]
> Thanks very much.
> commit e480af09c49736848f749a43dff2c902104f6691 avoided the NMI watchdog
> trigger.

Hm, okay... But "touch_nmi_watchdog() everywhere printk/console-related"
is not exactly where I wanted us to be.

By the way e480af09c49736848f749a43dff2c902104f6691 is from 2006.
Are you sure you meant exactly that commit? What kernel do you use?


Are you saying that none of Steven's patches helped on your setups?


> And this patch may avdoid long time blocking:
> https://lkml.org/lkml/2018/3/8/584
>
> We've test it several days.

Hm, printk_deferred is a bit dangerous; it moves console_unlock() to
IRQ. So you still can have the problem of stuck CPUs, it's just now
you shut up the watchdog. Did you test Steven's patches?


A tricky part about printk_deferred() is that it does not use hand off
mechanism. And even more... What we have with "printk vs printk"
sceanrio

CPU0 CPU1 ... CPUN

printk printk
console_unlock hand off printk
console_unlock hand off
console_unlock

turns into a good old "one CPU prints it all" when we have "printk vs
printk_deferred" case. Because printk_deferred just log_store messages
and then _may be_ it grabs the console_sem from IRQ and invokes
console_unlock().

So it's something like this

CPU0 CPU1 ... CPUN

printk printk_deffered
console_unlock printk_deferred
console_unlock
console_unlock
... ... ...
printk_deffered printk_deferred
console_unlock
console_unlock


// offtopic "I can has printk_kthread?"



You now touch_nmi_watchdog() from the console driver [well... at least this
is what e480af09c4973 is doing, but I'm not sure I see how come you didn't
have it applied], so that's why you don't see hard lockups on that CPU0. But
your printing CPU still can stuck, which will defer RCUs on that CPU, etc.
etc. etc. So I'd say that those two approaches

printk_deferred + touch_nmi_watchdog

combined can do quite some harm. One thing for sure - they don't really fix
any problems.

-ss