Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

From: Sergey Senozhatsky
Date: Fri Feb 28 2020 - 06:49:34 EST


On (20/02/28 11:04), Petr Mladek wrote:
> > On (20/02/27 14:08), Lech Perczak wrote:
> > > W dniu 27.02.2020 o 13:39, Lech Perczak pisze:
> > > > W dniu 27.02.2020 o 13:36, Greg Kroah-Hartman pisze:
> > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > >>> Hello,
> > > >>>
> > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > >> That is really really odd.
> > > > Very odd it is indeed.
> > > >>> My test scenario for bisecting was:
> > > >>> 1. run 'dmesg --follow' as root
> > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > >>>
>
> I have reproduced the problem with a kernel based on v4.19.106
> and I see the following in the log:
>
> [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> [ 0.028412] pcpu-alloc: [0] 0 1 2 3
>
> Note that percpu stuff is initialized after printk_deferred(). And the
> deferred console is scheduled by:
>
> void defer_console_output(void)
> {
> preempt_disable();
> __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> preempt_enable();
> }

Thanks.
I thought about "per-CPU printk() stuff happening too early", but
couldn't figure out why would it cause any problems later, when we
have everything setup and working.

Note that we test printk_safe_irq_ready only before irq_work_queue(), but
otherwise we access per-CPU printk_context. Theoretically we also can touch
per-CPU printk_context or even printk() to per-CPU buffer "too early".

> I am afraid that the patch creates some mess via the non-initialized
> per-cpu variable.

We have `printk_safe_irq_ready' for printk() related irq_work. Maybe
we can use it in printk.c as well. We never know when printk_deferred()
may trigger, so this type of problems can repeat.

-ss