Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

From: Sergey Senozhatsky
Date: Fri Feb 28 2020 - 07:02:44 EST


On (20/02/28 10:11), John Ogness wrote:
[..]
> >> >>> My test scenario for bisecting was:
> >> >>> 1. run 'dmesg --follow' as root
> >> >>> 2. run 'echo t > /proc/sysrq-trigger'
> >> >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> >> >>>
> >> >>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
> >> >>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
> >
> > This is very-very odd... Hmm.
> > Just out of curiosity, what happens if you comment out that
> > printk() entirely?
> >
> > printk_deferred() should not affect the PRINTK_PENDING_WAKEUP path.
>
> It is the printk_deferred() causing the issue. This is relatively early,
> so perhaps something is not yet properly initialized.
>
> > Either we never queue wakeup irq_work(), e.g. because
> > waitqueue_active() never lets us to do so or because `(curr_log_seq !=
> > log_next_seq)' is always zero
>
> wake_up_klogd() is called and the waitqueue (@log_wait) is
> active. irq_work_queue() is called, but the work function,
> wake_up_klogd_work_func(), is never called.
>
> Perhaps @wake_up_klogd_work gets broken somehow. I'm looking into it.

Thanks.

The interesting part here is that @wake_up_klogd_work is per-CPU. So
while I can imagine that, for instance, boot-CPU would get busted, but
not sure I see why all CPUs would experience problems. Maybe we hit
that randomness warning for every CPU during bring up? Then maybe some
more randomness-related patches need to be backported to 4.19?

-ss