Re: [PATCH] printk: inject caller information into the body of message

From: Fengguang Wu
Date: Wed Jun 20 2018 - 07:37:44 EST


On Wed, Jun 20, 2018 at 11:30:05AM +0200, Dmitry Vyukov wrote:
On Wed, Jun 20, 2018 at 11:06 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@xxxxxxxxx> wrote:
Hi Dmitry,

On (06/20/18 10:45), Dmitry Vyukov wrote:
Hi Sergey,

What are the visible differences between this patch and Tetsuo's
patch?

I guess none, and looking at your requirements below I tend to agree
that Tetsuo's approach is probably what you need at the end of the day.

The only thing that will matter for syzkaller parsing in the
end is the resulting text format as it appears on console. But you say
"I'm not pushing for this particular message format", so what exactly
do you want me to provide feedback on?
I guess we need to handle pr_cont properly whatever approach we take.

Mostly, was wondering about if:
a) you need pr_cont() handling
b) you need printk_safe() handling

The reasons I left those things behind:

a) pr_cont() is officially hated. It was never supposed to be used
on SMP systems. So I wasn't sure if we need all that effort and
add tricky code to handle pr_cont(). Given that syzkaller is
probably the only user of that functionality.

Well, if I put my syzkaller hat on, then I don't care what exactly
happens in the kernel, the only thing I care is well-formed output on
console that can be parsed unambiguously in all cases.

+1 for 0day kernel testing.

I admit that goal may never be 100% achievable -- at least some serial
console logs can sometimes become messy. So we'll have to write dmesg
parsing code in defensive ways.

But some unnecessary pr_cont() broken-up messages can obviously be
avoided. For example,

arch/x86/mm/fault.c:

printk(KERN_ALERT "BUG: unable to handle kernel ");
if (address < PAGE_SIZE)
printk(KERN_CONT "NULL pointer dereference");
else
printk(KERN_CONT "paging request");

I've actually proposed to remove the above KERN_CONT, unfortunately the
patch was silently ignored.

From this point of view I guess pr_cont is actually syzkaller's worst
enemy. If pr_const is officially hated, and it causes corrupted crash
reports, then we can resolve it by just getting rid of more pr_cont's.
So potentially we do not need any support for pr_cont in this patch.
However, we also need to be practical and if there are tons of
pr_cont's then we need some intermediate support of them, just because
we won't be able to get rid of all of them overnight.

But even if we attach context to pr_cont, it still causes problems for
crash parsing, because today we see:

BUG: unable to handle
... 10 lines ...
kernel
... 10 lines ...
paging request
... 10 lines ...
at ADDR

Which is not too friendly for parsing regardless of contexts.

We met exactly the same issue and ended up with special handling in
https://github.com/intel/lkp-tests/blob/master/lib/dmesg.rb:

/(BUG: unable to handle kernel)/,
/(BUG: unable to handle kernel) NULL pointer dereference/,
/(BUG: unable to handle kernel) paging request/,

So I am leaning towards to getting rid of pr_cont's as the solution to
the problem.

+1 for reducing unnecessary pr_cont() uses.

Thanks,
Fengguang