Re: [Bug 199003] console stalled, cause Hard LOCKUP.

From: Sergey Senozhatsky
Date: Wed Mar 21 2018 - 22:37:59 EST


On (03/22/18 11:14), Sergey Senozhatsky wrote:
[..]
> Looking at
> printk()->call_console_drivers()->serial8250_console_putchar()->wait_for_xmitr()
>
> ... wait_for_xmitr() can spin for over 1 second waiting for the UART_MSR_CTS
> bit.

[..]

> a 1+ second long busy loop in the console driver is quite close to
> "problems guaranteed". But, wait, there is even more. This wait_for_xmitr()
> busy wait is happening after every character we print on the console. So
> printk("foo") will generate 5 * wait_for_xmitr() busy loops [foo + \r + \n].
> They punch&touch watchdog a lot, so at the least the system won't get killed
> by the hardlockup detector. But at the same time, it's still potentially a
> 1+ second busy loop in the console driver * strlen(message).

One does not even need to have concurrent printk()-s in this case. A
single CPU doing several direct printks under spin_lock is already
enough:

CPUA CPUB ~ CPUZ
spin_lock(&lock)
printk->wait_for_xmitr spin_lock(&lock)
printk->wait_for_xmitr
...
printk->wait_for_xmitr << lockups >>
printk->wait_for_xmitr
spin_unlock(&lock)

> Sometimes I really wish we had detached consoles. Direct printk()->console
> is nice and cool, but... we can't have it.

And this is, basically, what they do with printk_deferred(). We usually
use it to avoid deadlocks, but in this particular case it's used due to
the fact that direct printk() is way too painful, so they are detaching
printout and move it to another control path. Quite an interesting idea,
I must say.

-ss