Re: printk: Add process name information to printk() output.

From: Petr Mladek
Date: Fri Sep 04 2020 - 08:45:47 EST


On Fri 2020-09-04 11:53:42, John Ogness wrote:
> On 2020-09-04, Changki Kim <changki.kim@xxxxxxxxxxx> wrote:
> > Printk() meesages are the most basic and useful debug method.
> > However, additional information needs in multi-processor.
> > If we add messages with processor id and process name, we can find
> > a problem only with messages when the problem occurs with H/W IP or CPU.
> > This is very useful in narrowing down the scope of the problems.
>
> [...]
>
> > diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> > index e6302da041f9..fcefe9516606 100644
> > --- a/kernel/printk/printk_ringbuffer.h
> > +++ b/kernel/printk/printk_ringbuffer.h
> > @@ -21,6 +22,12 @@ struct printk_info {
> > u8 flags:5; /* internal record flags */
> > u8 level:3; /* syslog level */
> > u32 caller_id; /* thread id or processor id */
> > +#ifdef CONFIG_PRINTK_PROCESS
> > + int pid; /* process id */
> > + u8 cpu_id; /* processor id */
> > + u8 in_interrupt; /* interrupt conext */
> > + char process[TASK_COMM_LEN]; /* process name */
> > +#endif
> > };
>
> I can understand the desire to have more information with messages. But
> IMHO adding it to the ringbuffer descriptor is the wrong place for
> it. The descriptor should really be limited to data that the printk
> subsystem needs for _itself_.

I fully agree.

> With respect to LOG_CONT, I think we can agree that @caller_id is
> not enough. But there has been discussions [0]
> of having @caller_id provide a better context representation.

I am not sure if I read the above correctly. Anyway, @caller_id is
needed by printk() to handle continuous lines. It would be great
to store more precise information about the irq context. But
more details, like process name, have to be stored somewhere else.


> If we want to support adding more meta information to messages, I would
> prefer that the information is either prepended directly to the message
> text string or appended to the dictionary text string.

I have two problems with dict ring:

+ does not guarantee enough space to always store the metadata
+ waste of space by repeating the same prefixes again and again


I am currently playing with support for all three timestamps based
on https://lore.kernel.org/lkml/20200814101933.574326079@xxxxxxxxxxxxx/

And I got the following idea:


1. Storing side:

Create one more ring/array for storing the optional metadata.
It might eventually replace dict ring, see below.

struct struct printk_ext_info {
u64 ts_boot; /* timestamp from boot clock */
u64 ts_real; /* timestamp from real clock */
char process[TASK_COMM_LEN]; /* process name */
};

It must be in a separate array so that struct prb_desc stay stable
and crashdump tools do not need to be updated so often.

But the number of these structures must be the same as descriptors.
So it might be:

struct prb_desc_ring {
unsigned int count_bits;
struct prb_desc *descs;
struct printk_ext_info *ext_info
atomic_long_t head_id;
atomic_long_t tail_id;
};

One huge advantage is that these extra information would not block
pushing lockless printk buffer upstream.

It might be even possible to get rid of dict ring and just
add two more elements into struct printk_ext_info:

char subsystem[16]; /* for SUBSYSTEM= dict value */
char device[48]; /* for DEVICE= dict value */

Pros:

+ the information will always get stored

Cons:

+ the names might get shrinked

Space usage:

+ No need to store "SUBSYSTEM=" and "DEVICE=" repeatedly.
+ Space used only for dev_printk()


Well, I think that it might work in practice.



2. Reading side:

+ /dev/kmsg is easy. It makes perfect sense to pass all extra
information in the dictionary format:

+ less effective but /dev/kmsg is fast
+ does not break userspace
+ might be shown even by the current journalctl


+ syslog syscall and /proc/syslog are hard to extend.
I would ignore them. Users need to use /dev/kmsg to
get the dictionary metadata already now.


+ console is a sensitive beast

The output format would need to be configurable using
build/boot/runtime options. It is already the case
for the timestamp, see printk_time parameter.

And people would probably want to have this configurable
per-console to disable this on the slow ones, sigh.


Best Regards,
Petr