Re: [PATCH][RFC] printk: make pr_cont buffer per-cpu

From: Petr Mladek
Date: Thu Aug 25 2016 - 18:14:37 EST


On Wed 2016-08-24 23:27:29, Sergey Senozhatsky wrote:
> On (08/24/16 10:19), Petr Mladek wrote:
> > > On (08/23/16 13:47), Petr Mladek wrote:
> > > [..]
> > > > > if (!(lflags & LOG_NEWLINE)) {
> > > > > + if (!this_cpu_read(cont_printing)) {
> > > > > + if (system_state == SYSTEM_RUNNING) {
> > > > > + this_cpu_write(cont_printing, true);
> > > > > + preempt_disable();
> > > > > + }
> > > > > + }
> > > >
> > > > I am afraid that this is not acceptable. It means that printk() will have
> > > > an unexpected side effect. The missing "\n" at the end of a printed
> > > > string would disable preemption. See below for more.
> > >
> > > missing '\n' must WARN about "sched while atomic" eventually, so it
> > > shouldn't go unnoticed or stay hidden.
> >
> > Well, it will still force people to rebuilt a test kernel because they
> > forget to use '\n" and the test kernel is unusable.
>
> you are right. misusage of printk() will now force user to go and fix
> it. the kernel most likely will be rebuilt anyway - there is a missing
> \n after all.

It is not a big problem for the final build. But it is a problem when adding
temporary debugging messages. They will get removed anyway. But people
will hate that some debugging builds are unusable just because of
a missing "\n".

> and rebuild here is pretty much incremental because basically
> only one line is getting updated (missing `\n' in printk() message). so
> it's like 15 seconds, perhaps.

This is not completely true. Sometimes the developer is not able to
test it hemself and need to send a patched kernel with some debug
messages to the customer. The turnaround might be hours or even days.

Also we have a build service for building packages here at SUSE.
It uses rather powerful machines. If we need to do testing on
some less powerfull machine, it is sometimes faster to build
the package by the build service. Sometimes it is the only
possibility if the test machine does not have enought disk
or memory. The point is that the build service always does
clean build and the turnaround takes minutes.

Not to say that the missing newline might be in an error path
that is hard to test and might appear even in the released build.


> > IMHO, the connection between '\n' and preemption is not
> > intuitive and hard to spot. We should do our best to avoid it.
>
> yes. what can be done is a simple hint -- we are in preempt disabled
> path when we report `sched while atomic', so it's safe to check if
> this_cpu(cont).len != 0 and if it is then additionally report:
> "incomplete cont line".
>
> in normal life no one _probably_ would see the change. well, I saw broken
> backtraces on my laptop before, but didn't consider it to be important.
> for the last two days the problem is not theoretical anymore on my side,
> as now I'm looking at actual pr_cont()-related bug report [internal].
> I'm truing to move the whole thing into 'don't use the cont lines' direction;
> there is a mix of numerous pr_cont() calls & missing new-lines [internal code].
> the kernel, unfortunately, IMHO, was too nice in the latter case, flushing
> incomplete cont buffer with LOG_NEWLINE. so many of those `accidental' cont
> lines went unnoticed.
>
> this "don't use the cont lines" is actually a bit tricky and doesn't
> look so solid. there is a need for cont lines, and the kernel has cont
> printk... but it's sort of broken, so people have to kinda work-around
> it. not the best example, but still (arch/arm64/kernel/traps.c)
>
> static void __dump_instr(const char *lvl, struct pt_regs *regs)
> {
> unsigned long addr = instruction_pointer(regs);
> char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str;
> int i;
>
> for (i = -4; i < 1; i++) {
> unsigned int val, bad;
>
> bad = __get_user(val, &((u32 *)addr)[i]);
>
> if (!bad)
> p += sprintf(p, i == 0 ? "(%08x) " : "%08x ", val);
> else {
> p += sprintf(p, "bad PC value");
> break;
> }
> }
> printk("%sCode: %s\n", lvl, str);
> }
>
> which is fine, but getting a bit boring once you have more than,
> let's say, 20 function that want to do cont output.

Well, there are only two extra lines. The buffer definition and
the final printk().

Of course, it would be great to fix it transparently. But if there must
be a burden, I would prefer to keep it on the "corner" case users
rather than to push it on everyday users.

Best Regards,
Petr