Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Fri Dec 15 2017 - 03:42:47 EST


On (12/15/17 09:31), Petr Mladek wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following preconditions:
> > > >
> > > > a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > > etc) context
> > > >
> > > > what does guarantee that? what happens if there is NO non-atomic
> > > > CPU or that non-atomic simplky missses the console_owner != false
> > > > point? we are going to conclude
> > > >
> > > > "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > >
> > > >
> > > > what if that non-atomic CPU does not call printk(), but instead
> > > > it does console_lock()/console_unlock()? why there is no handoff?
> > > >
> > > > CPU0 CPU1 ~ CPU10
> > > > in atomic contexts [!]. ping-ponging console_sem
> > > > ownership to each other. while what they really
> > > > need to do is to simply up() and let CPU0 to
> > > > handle it.
> > > > printk
> > > > console_lock()
> > > > schedule()
> > > > ...
> > > > printk
> > > > printk
> > > > ...
> > > > printk
> > > > printk
> > > >
> > > > up()
> > > >
> > > > // woken up
> > > > console_unlock()
> > > >
> > > > why do we make an emphasis on fixing vprintk_printk()?
>
> Is the above scenario really dangerous? console_lock() owner is
> able to sleep. Therefore there is no risk of a softlockup.
>
> Sure, many messages will get stacked in the meantime and the console
> owner my get then passed to another owner in atomic context. But
> do you really see this in the real life?

console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
sleeping waiting for the console_sem. while atomic printk CPUs just hand
off console_sem ownership to each other without ever up()-ing the console_sem.
what's the point of hand off here? how is that going to work?

what we need to do is to offload printing from atomic contexts to a
non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
on the console_sem, ready to continue printing. but it never gets its
chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
around, resulting in the same "we print from atomic context" thing.

> Of course, there is a chance that it will pass the work from
> a safe context to atomic one. But there was the same chance that
> the work already started in the atomic context. Therefore statistically
> this should not make things worse.

which is not a justification. we are not looking for a solution that
does not make the things worse. we are looking for a solution that
does improve the things.

-ss