Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Steven Rostedt
Date: Fri Dec 15 2017 - 10:40:17 EST


On Fri, 15 Dec 2017 15:52:05 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> wrote:

> On (12/15/17 14:06), Sergey Senozhatsky wrote:
> [..]
> > > Where do we do the above? And has this been proven to be an issue?
> >
> > um... hundreds of cases.
> >
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the configuration).
>
>
> and, Steven, one more thing. wondering what's your opinion.
>
>
> suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
> printk-s and several atomic CPUs doing printk-s. Is proposed hand off
> scheme really useful in this case? CPUs will now
>
> a) print their lines (a potentially slow call_console_drivers())

My question is, is this an actual problem. Because we haven't fully
evaluated if my patch is enough to make this a non issue.

>
> and
>
> b) spin in vprintk_emit on console_owner with local IRQs disabled
> waiting for either non-atomic printk CPU or another atomic CPU
> to finish printing its line (call_console_drivers()) and to hand
> off printing. so current CPU, after busy-waiting for foreign CPU's
> call_console_drivers(), will go and do his own call_console_drivers().
> which, time-wise, simply doubles (roughly) the amount of time that
> CPU spends in printk()->console_unlock(). agreed?

Worse case is doubling, if the two printks happen at the same time.

Today, it is unbounded, which means my patch is much better than what
we do today, and hence we still don't know if it is "good enough". This
is where I say you are trying to solve another problem than what we are
encountering today. Because we don't know if the problem is due to the
unbounded nature of printk, or just a slight longer (but bounded) time.

>
> if we previously could have a case when non-atomic printk CPU would
> grab the console_sem and print all atomic printk CPUs messages first,
> and then its own messages, thus atomic printk CPUs would have just
> log_store(), now we will have CPUs to call_console_driver() and to
> spin on console_sem owner waiting for call_console_driver() on a foreign
> CPU [not all of them: it's one CPU doing the print out and one CPU
> spinning console_owner. but overall I think all CPUs will experience
> that spin on console_sem waiting for call_console_driver() and then do
> its own call_console_driver()].
>
>
> even two CPUs case is not so simple anymore. see below.
>
> - first, assume one CPU is atomic and one is non-atomic.
> - second, assume that both CPUs are atomic CPUs, and go thought it again.
>
>
> CPU0 CPU1
>
> printk() printk()
> log_store()
> log_store()
> console_unlock()
> set console_owner
> sees console_owner
> sets console_waiter
> spin
> call_console_drivers()
> sees console_waiter
> break
>
> printk()
> log_store()
> console_unlock()
> set console_owner
> sees console_owner
> sets console_waiter
> spin
> call_console_drivers()
> sees console_waiter
> break
>
> printk()
> log_store()
> console_unlock()
> set console_owner
> sees console_owner
> sets console_waiter
> spin
> call_console_drivers()
> sees console_waiter
> break
>
> printk()
> log_store()
> console_unlock()
> set console_owner
> sees console_owner
> sets console_waiter
> spin
>
> ....
>
>
> that "wait for call_console_drivers() on another CPU and then do
> its own call_console_drivers()" pattern does look dangerous. the
> benefit of hand-off is really fragile sometimes, isn't it?

Actually, I see your scenario as a perfect example of my patch working
well. You have two CPUs spamming the console with printks, and instead
of one CPU stuck doing nothing but outputting both CPUs work, the two
share the load.

Again, I'm not convinced that all the issues you have encountered so
far can not be solved with my patch. I would like to see a real example
of where it fails. Lets not make this a theoretical approach, lets do
it incrementally and solve one problem at a time.

I know for a fact that the one CPU outputting all other CPUs printks is
a problem. That has been shown time and time again (and I believe is
the cause for the problems you stated).

The cause of the above issue is the unbounded nature of printk. The
fact that there's no guarantee that printk will ever finish. My patch
makes it have to finish and makes it bounded. In RT, that's what we
always work on, and that usually is good enough. Priority inheritance
can take a long time too, but it causes everything to have a guaranteed
max latency. My patch makes printk guaranteed to stop, and I want to
see if this is good enough for all current issues we've come across
before we go into any more elaborate algorithms.

-- Steve