Re: [PATCH] rcu: Add cpu-exp indicator to expedited RCU CPU stall warnings

From: Paul E. McKenney
Date: Wed May 18 2022 - 14:14:30 EST


On Wed, May 18, 2022 at 07:43:10PM +0800, Zqiang wrote:
> This commit adds a "D" indicator to expedited RCU CPU stall warnings.
> when an expedited grace period begins, due to CPU disable interrupt
> time too long, cause the IPI(rcu_exp_handler()) unable to respond in
> time, this debugging id will be showed.
>
> runqemu kvm slirp nographic qemuparams="-m 4096 -smp 4" bootparams=
> "isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3 rcutree.dump_tree=1
> rcutorture.stall_cpu_holdoff=30 rcutorture.stall_cpu=40
> rcutorture.stall_cpu_irqsoff=1 rcutorture.stall_cpu_block=0
> rcutorture.stall_no_softlockup=1" -d
>
> rcu_torture_stall start on CPU 1.
> ............
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks:
> { 1-...D } 26467 jiffies s: 13317 root: 0x1/.
> rcu: blocking rcu_node structures (internal RCU debug): l=1:0-1:0x2/.
> Task dump for CPU 1:
> task:rcu_torture_sta state:R running task stack: 0 pid: 76
> ppid: 2 flags:0x00004008
>
> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>

Nice!!! I have queued this for v5.20 and for further testing and
review, thank you!

As usual, I could not resist the temptation to wordsmith the commit log,
so could you please check it in case I messed something up?

Thanx, Paul

------------------------------------------------------------------------

commit 178b9d47f3049e8122738c3166ee4975b75cba55
Author: Zqiang <qiang1.zhang@xxxxxxxxx>
Date: Wed May 18 19:43:10 2022 +0800

rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings

If a CPU has interrupts disabled continuously starting before the
beginning of a given expedited RCU grace period, that CPU will not
execute that grace period's IPI handler. This will in turn mean
that the ->cpu_no_qs.b.exp field in that CPU's rcu_data structure
will continue to contain the boolean value false.

Knowing whether or not a CPU has had interrupts disabled can be helpful
when debugging an expedited RCU CPU stall warning, so this commit
adds a "D" indicator expedited RCU CPU stall warnings that signifies
that the corresponding CPU has had interrupts disabled throughout.

This capability was tested as follows:

runqemu kvm slirp nographic qemuparams="-m 4096 -smp 4" bootparams=
"isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3 rcutree.dump_tree=1
rcutorture.stall_cpu_holdoff=30 rcutorture.stall_cpu=40
rcutorture.stall_cpu_irqsoff=1 rcutorture.stall_cpu_block=0
rcutorture.stall_no_softlockup=1" -d

The rcu_torture_stall() function ran on CPU 1, which displays the "D"
as expected given the rcutorture.stall_cpu_irqsoff=1 module parameter:

............
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks:
{ 1-...D } 26467 jiffies s: 13317 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug): l=1:0-1:0x2/.
Task dump for CPU 1:
task:rcu_torture_sta state:R running task stack: 0 pid: 76 ppid: 2 flags:0x00004008

Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 4c7037b507032..f092c7f18a5f3 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -637,10 +637,11 @@ static void synchronize_rcu_expedited_wait(void)
continue;
ndetected++;
rdp = per_cpu_ptr(&rcu_data, cpu);
- pr_cont(" %d-%c%c%c", cpu,
+ pr_cont(" %d-%c%c%c%c", cpu,
"O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rnp->expmaskinit)],
- "N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
+ "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
+ "D."[!!(rdp->cpu_no_qs.b.exp)]);
}
}
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",