Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

From: Petr Mladek
Date: Fri Feb 02 2018 - 07:17:22 EST


On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> On (01/30/18 13:23), Petr Mladek wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> >
> > But the patch still uses irq work because queue_work_on() could not
> > be safely called from printk_safe(). By other words, it requires
> > both irq_work and workqueues to be functional.
>
> Right, that's all true. The reason it's done this way is because buffers can
> be big and we still flush under console_sem in console_unlock() loop, which
> can in theory be problematic. In other words, I wanted to remove the root
> cause - irq flush of printk_safe while we are still in printing
> loop.

Good point! We know that we would eventually push non-trivial amount
of messages and it would make sense to do it from non-atomic context.

On the other hand, this does not solve the same problem with printk
NMI buffer. And I guess that we do not want to risk offloading to
workqueues for NMI messages.


> > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > printk_safe was never about "immediately flush the buffer", it was about
> > > "avoid deadlocks", which was extended to "flush from any context which
> > > will let us to avoid deadlock". It just happened that it inherited
> > > irq_work dependency from printk_nmi.
> >
> > I see the point. But if I remember correctly, it was also designed
> > before we started to be concerned about a sudden death and "get
> > printks out ASAP" mantra.
>
> Can you elaborate a bit?

The pull request with printk_safe was sent on February 22, 2017, see
https://lkml.kernel.org/r/20170222114705.GA30336@xxxxxxxxxx

The printk softlockup was still being solved by an immediate offload
from vprintk_emit() on March 29, 2017, see
https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhatsky@xxxxxxxxx

I believe that it was the mail from Pavel Machek that made us
thinking about the sudden death. It was sent on April 7, 2017,
see https://lkml.kernel.org/r/20170407120642.GB4756@amd

The first version with the offload from console_unlock was
sent on May 9, 2017, see
https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhatsky@xxxxxxxxx

I am not exactly sure when the "get printks out ASAP" mantra started
but I cannot forget the mail from June 30, 2017, see
https://lkml.kernel.org/r/20170630070131.GA474@xxxxxxxxxxxxxxxxxxxxxxxx

Best Regards,
Petr