Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

From: Sergey Senozhatsky
Date: Thu Feb 27 2020 - 22:13:13 EST


Cc-ing Petr, Steven, John

https://lore.kernel.org/lkml/e9358218-98c9-2866-8f40-5955d093dc1b@xxxxxxxxxxxxxxxxxxxxxx

On (20/02/27 14:08), Lech Perczak wrote:
> W dniu 27.02.2020 o 13:39, Lech Perczak pisze:
> > W dniu 27.02.2020 o 13:36, Greg Kroah-Hartman pisze:
> >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> >>> Hello,
> >>>
> >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> >> That is really really odd.
> > Very odd it is indeed.
> >>> My test scenario for bisecting was:
> >>> 1. run 'dmesg --follow' as root
> >>> 2. run 'echo t > /proc/sysrq-trigger'
> >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> >>>
> >>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
> >>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
> >>>
> >>> I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.
> >> I can revert this, but it feels like there is something else going wrong
> >> here. Can you try the 5.4 tree to see if that too has your same
> >> problem?
> > Yes, I'll check it in a short while.
> I checked v5.4.22 just now and didn't observe the issue. Maybe this commit wasn't destined for 4.19.y due to intricacies of locking inside printk() :-)
>
> From my side, there is no need to rush with the revert, as I can do this locally and drop the revert with next rebase-to-upstream, which we do very often.
> OTOH, the issue is likely to affect a lot of users, especially ones using distros tracking this branch like Debian 10 mentioned earlier,
> once they pick it up the change, as the kernel log content recorded by syslog will be affected, and 'dmesg --follow' behaviour will be quite surprising.

This is very-very odd... Hmm.
Just out of curiosity, what happens if you comment out that
printk() entirely?

printk_deferred() should not affect the PRINTK_PENDING_WAKEUP path.

Either we never queue wakeup irq_work(), e.g. because waitqueue_active()
never lets us to do so or because `(curr_log_seq != log_next_seq)' is
always zero:

void wake_up_klogd(void)
{
preempt_disable();
if (waitqueue_active(&log_wait)) {
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
}
preempt_enable();
}

int vprintk_emit()
{
...
pending_output = (curr_log_seq != log_next_seq);
...
if (pending_output)
wake_up_klogd()
}

Or we do wakeup, but then either `syslog_seq != log_next_seq' or
`user->seq != log_next_seq' fail.


Lech, any chance you can trace what's happening in the system?

-ss