Re: [PATCH -printk] printk, tracing: fix console tracepoint

From: Paul E. McKenney
Date: Mon Jul 11 2022 - 22:57:43 EST


On Mon, Jul 11, 2022 at 08:53:19PM -0400, Steven Rostedt wrote:
> On Mon, 11 Jul 2022 17:21:28 -0700
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
>
> > On x86, both srcu_read_lock() and srcu_read_unlock() should be OK from
> > NMI context, give or take their use of lockdep. Which is why we have
> > srcu_read_lock_notrace() and srcu_read_unlock_notrace(), which do not
> > use lockdep. Which __DO_TRACE() does in fact invoke. Ah, but you have
> > this: "WARN_ON_ONCE(rcuidle && in_nmi())".
> >
> > Because all the world is not an x86.
>
> But since NMIs are architecture specific, we could change that to:
>
> WARN_ON_ONCE(!srcu_nmi_safe && rcuidle && in_nmi());
>
> and add a srcu_nmi_safe constant or macro that is 1 on architectures that
> srcu is safe in NMI and 0 otherwise.
>
> Or do we care if a tracepoint happens in those architectures where it is
> not safe. We could then just do:
>
> if (!srcu_nmi_safe && rcuidle && in_nmi())
> return;
>
> and just skip tracepoints that are marked rcu_idle and happen within NMI.

More generally, this is this_cpu_nmi_safe rather than just SRCU. Or could
be, depending on what the architecture guys would like to guarantee.
SRCU is just passing through the this_cpu*() NMI-safety property.

And in addition to in_nmi(), there is the HAVE_NMI Kconfig option:

$ git grep -w HAVE_NMI
arch/Kconfig:config HAVE_NMI
arch/Kconfig: depends on HAVE_NMI
arch/arm/Kconfig: select HAVE_NMI
arch/arm64/Kconfig: select HAVE_NMI
arch/loongarch/Kconfig: select HAVE_NMI
arch/mips/Kconfig: select HAVE_NMI
arch/powerpc/Kconfig: select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S)
arch/s390/Kconfig: select HAVE_NMI
arch/sh/Kconfig: select HAVE_NMI
arch/sparc/Kconfig: select HAVE_NMI
arch/x86/Kconfig: select HAVE_NMI

So if I understand correctly, arm, loongarch, mips, powerpc, sh, and
sparc are the ones that have NMIs but don't have NMI-safe this_cpu*()
operations. These are the ones that would need !srcu_nmi_safe.

Or, longer term, I could make HAVE_NMI && !srcu_safe_nmi cause SRCU to use
explicit atomics for srcu_read_lock_trace() and srcu_read_unlock_trace().
I am assuming that any NMI handler executing srcu_read_lock_trace()
also executes the matching srcu_read_unlock_trace(). (Silly me, I know!)
This assumption means that srcu_read_lock() and srcu_read_unlock() can
continue with their non-atomic this_cpu_inc() ways.

But a quick fix that stopped the bleeding and allowed printk() to
progress would be useful in the short term, regardless of whether or
not in the longer term it makes sense to make srcu_read_lock_trace()
and srcu_read_unlock_trace() NMI-safe.

Thoughts?

Thanx, Paul