Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

From: Paul E. McKenney
Date: Sat Nov 05 2022 - 16:32:58 EST


On Sat, Nov 05, 2022 at 03:03:14PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/11/5 9:58, Elliott, Robert (Servers) wrote:

[ . . . ]

> >> +int rcu_cpu_stall_cputime __read_mostly =
> >> IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
> >
> > As a config option and module parameter, adding some more
> > instrumentation overhead might be worthwhile for other
> > likely causes of rcu stalls.
> >
> > For example, if enabled, have these functions (if available
> > on the architecture) maintain a per-CPU running count of
> > their invocations, which also cause the CPU to be unavailable
> > for rcu:
> > - kernel_fpu_begin() calls - FPU/SIMD context preservation,
> > which also calls preempt_disable()
> > - preempt_disable() calls - scheduler context switches disabled
> > - local_irq_save() calls - interrupts disabled
> > - cond_resched() calls - lack of these is a problem
> >
> > For kernel_fpu_begin and preempt_disable, knowing if it is
> > currently blocked for those reasons is probably the most
> > helpful.
>
> These instructions is already in Documentation/RCU/stallwarn.rst

Excellent point -- this document also needs to be updated with this
new information. I have pulled in your four patches as noted in my
previous email. They are on the -rcu tree's "dev" branch.

Could you please send a patch containing an initial update to
stallwarn.rst? The main thing I need is your perspective on how each
field is used.

Thanx, Paul