Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread

From: Petr Mladek
Date: Tue Mar 12 2019 - 06:30:22 EST

Next message: Jiri Olsa: "Re: [PATCH v1 04/10] perf, tools, record: Clarify help for --switch-output"
Previous message: Chuanhong Guo: "Re: [PATCH] arm64: dts: meson-gxl-s905d-phicomm-n1: add status LED"
In reply to: Sergey Senozhatsky: "Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon 2019-03-11 11:51:49, John Ogness wrote:
> On 2019-03-07, Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> wrote:
> > I don't really understand the role of loglevel anymore.
>
> "what the kernel considers" is a configuration option of the
> administrator. The administrator can increase the verbocity of the
> console (loglevel) without having negative effects on the system
> itself.

Where do you get the confidence that atomic console will not
slow down the system? Have you tried it on real life workload
when debugging a real life bug?

Some benchmarks might help. Well, it would be needed to
trigger some messages from them and see how the different
approaches affect the overall system performance.

> Also, if the system were to suddenly crash, those crash messages
> shouldn't be in jeopardy just because the verbocity of the console was
> turned up.

This expects that the error messages will be enough to discover
and fix the problem.

> You (and Petr) talk about that _all_ console printing is for
> emergencies. That if an administrator sets the loglevel to 7 it is
> because the pr_info messages are just as important as the pr_emerg.

It might be true when the messages with higher level (more critical)
are not enough to understand the situation.

> And if that is indeed the intention of console printing and loglevel, then
> why is asynchronous printk calls for console messages even allowed
> today? IMO that isn't taking the importance of the message very
> seriously.

Because it was working pretty well in the past. The amount of messages
is still growing (code complexity, more CPUs, more devices, ...).
Our customers have started reporting softlockups "only" 7 years ago
or so.

We currently have two level handling of messages:

+ all messages can be seen from userspace
+ messages below console_loglevel can be seen
on the console

You are introducing one more level of handling:

+ critical messages are printed on the console directly
even before the queued less critical ones

The third level would be acceptable when:

+ atomic consoles are reliable enough
+ the code complexity is worth the gain

IMHO, we mix too many things here:

+ log buffer implementation
+ console offload
+ direct console handling using atomic consoles

I see the potential in all areas:

+ lock less ring buffer helps to avoid deadlocks,
and extra log buffers

+ console offload prevents too long stalls (softlockups)

+ direct console handling might help to avoid deadlocks
and might make the output more reliable.

I think that we are on the same page here.

But we must use an incremental approach. It is not acceptable
to replace everything by a single patch. And it is not acceptable
to break important functionality and implement alternative
solution several patches later.

Also no solution is as ideal as it is sometimes presented
in this thread.

Best Regards,
Petr

Next message: Jiri Olsa: "Re: [PATCH v1 04/10] perf, tools, record: Clarify help for --switch-output"
Previous message: Chuanhong Guo: "Re: [PATCH] arm64: dts: meson-gxl-s905d-phicomm-n1: add status LED"
In reply to: Sergey Senozhatsky: "Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]