Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

From: Leizhen (ThunderTown)
Date: Mon Nov 07 2022 - 22:07:11 EST




On 2022/11/8 5:57, Elliott, Robert (Servers) wrote:
> I created a 22 second stall, which triggered two self-detected stall
> messages. The second one covers 18 seconds (and reports 17444 ms
> of system cputime), but still reports the half_timeout of 2.5 s on

Because I have not updated rsrp->gp_seq in print_cpu_stat_info().

Please add rsrp->gp_seq-- at the bottom of print_cpu_stat_info()
and try again.

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 2e560a70d88fd87..6f6c95d599e6436 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -455,6 +455,7 @@ static void print_cpu_stat_info(int cpu)
div_u64(cpustat[CPUTIME_SOFTIRQ] - rsrp->cputime_softirq, NSEC_PER_MSEC),
div_u64(cpustat[CPUTIME_SYSTEM] - rsrp->cputime_system, NSEC_PER_MSEC),
jiffies64_to_msecs(half_timeout));
+ rsrp->gp_seq--;
}

/*

> the right. The duration since the snapshot was taken would be
> more meaningful.
>
> [ 3428.422726] tcrypt: rcu testing - preempt_disable for rude 22 s
> [ 3433.419012] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 3433.425145] rcu: 52-....: (4993 ticks this GP) idle=7704/1/0x4000000000000000 softirq=8448/8448 fqs=1247
> [ 3433.435073] rcu: hardirqs softirqs csw system cond_resched
> [ 3433.443096] rcu: number: 0 5 0 0
> [ 3433.450930] rcu: cputime: 8 0 2489 ==> 2500 (ms)
> [ 3433.460151] rcu: current: in_kernel_fpu_begin=0 this_cpu_preemptible=0
> [ 3433.467006] (t=5044 jiffies g=127261 q=179 ncpus=56)
> [ 3433.472285] CPU: 52 PID: 44429 Comm: modprobe Not tainted 6.0.0+ #11
> [ 3433.478879] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/08/2022
> [ 3433.487664] RIP: 0010:rude_sleep_cycles+0x13/0x27 [tcrypt]
> ...
> [ 3433.717818] </TASK>
> [ 3448.719827] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 3448.725816] rcu: 52-....: (19994 ticks this GP) idle=7704/1/0x4000000000000000 softirq=8448/8448 fqs=5002
> [ 3448.735736] rcu: hardirqs softirqs csw system cond_resched
> [ 3448.743735] rcu: number: 0 38 0 0
> [ 3448.751560] rcu: cputime: 354 0 17444 ==> 2500 (ms)
> [ 3448.760780] rcu: current: in_kernel_fpu_begin=0 this_cpu_preemptible=0
> [ 3448.767643] (t=20348 jiffies g=127261 q=1019 ncpus=56)
> [ 3448.773106] CPU: 52 PID: 44429 Comm: modprobe Not tainted 6.0.0+ #11
> [ 3448.779704] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/08/2022
> [ 3448.788488] RIP: 0010:rude_sleep_cycles+0x13/0x27 [tcrypt]
> ...
>
>
>

--
Regards,
Zhen Lei