Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

From: Petr Mladek
Date: Fri Feb 28 2020 - 08:02:26 EST


On Fri 2020-02-28 12:32:14, Petr Mladek wrote:
> On Fri 2020-02-28 11:58:36, Greg Kroah-Hartman wrote:
> > On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> > > On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > > > Cc-ing Petr, Steven, John
> > > >
> > > > https://lore.kernel.org/lkml/e9358218-98c9-2866-8f40-5955d093dc1b@xxxxxxxxxxxxxxxxxxxxxx
> > > >
> > > > On (20/02/27 14:08), Lech Perczak wrote:
> > > > > W dniu 27.02.2020 o 13:39, Lech Perczak pisze:
> > > > > > W dniu 27.02.2020 o 13:36, Greg Kroah-Hartman pisze:
> > > > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > > > >>> Hello,
> > > > > >>>
> > > > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > > > >> That is really really odd.
> > > > > > Very odd it is indeed.
> > > > > >>> My test scenario for bisecting was:
> > > > > >>> 1. run 'dmesg --follow' as root
> > > > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > > > >>>
> > >
> > > I have reproduced the problem with a kernel based on v4.19.106
> > > and I see the following in the log:
> > >
> > > [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> > > [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> > > [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> > > [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> > > [ 0.028412] pcpu-alloc: [0] 0 1 2 3
> > >
> > > Note that percpu stuff is initialized after printk_deferred(). And the
> > > deferred console is scheduled by:
> > >
> > > void defer_console_output(void)
> > > {
> > > preempt_disable();
> > > __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > preempt_enable();
> > > }
> > >
> > > I am afraid that the patch creates some mess via the non-initialized
> > > per-cpu variable.
> > >
> > > I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> > > to be arch-specific.
> > >
> > > I do not see a reliable way to detect when per-cpu variables are
> > > initialized. Adding Tejun and PeterZ into CC if they have any
> > > idea.
> > >
> > > I suggest to revert the patch until we have some easy and safe solution.
> >
> > Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?
>
> Good question. Well, there have been many changes in the random number
> subsystem initialization recently. My bet is that it is much harder to
> hit the warning there.

OK, the problem has gone after I cherry-picked the commit
d55535232c3dbde9a523 ("random: move rand_initialize() earlier").
I still see the warning but it is printed much later:

[ 0.051846] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.054070] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16
[ 0.054281] random: get_random_bytes called from start_kernel+0x308/0x4fe with crng_init=0
[ 0.054430] Console: colour dummy device 80x25

But I am not sure if it is safe to backport this patch into the old
stable kernel.

Anyway, this fix would not be enough. The commit message mentions:

Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).


So, I would still prefer to _revert_ the commit 15341b1dd409749f
("char/random: silence a lockdep splat with printk()"). It calmed
down lockdep report. The real life danger is dubious. The warning
is printed early when the system is running on single CPU where
it could not race.

Best Regards,
Petr