Re: [PATCH] rcu: allow multiple stalls before panic

From: Paul E. McKenney
Date: Fri Sep 04 2020 - 14:05:31 EST


On Sun, Aug 30, 2020 at 11:41:17PM -0700, chao wrote:
> Some stalls are transient and system can fully recover.
> Allow users to configure the number of stalls experienced
> to trigger kernel Panic.
>
> Signed-off-by: chao <chao@xxxxxxxx>

Hearing no objections, I have queued this with wordsmithing as shown
below. Please let me know if I messed something up.

One question, though. It looks like setting this to (say) 5 would panic
after the fifth RCU CPU stall warning message, regardless whether all
five were reporting the same RCU CPU stall event or whether they instead
were five widely separated transient RCU CPU stall events, where the
system fully recovered from each event. Is this the intent?

Thanx, Paul

------------------------------------------------------------------------

commit e710c928fb52d8e56bc6173515805301da6aa22b
Author: chao <chao@xxxxxxxx>
Date: Sun Aug 30 23:41:17 2020 -0700

rcu: Panic after fixed number of stalls

Some stalls are transient, so that system fully recovers. This commit
therefore allows users to configure the number of stalls that must happen
in order to trigger kernel panic.

Signed-off-by: chao <chao@xxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 500def6..fc2dd3f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -536,6 +536,7 @@ extern int panic_on_warn;
extern unsigned long panic_on_taint;
extern bool panic_on_taint_nousertaint;
extern int sysctl_panic_on_rcu_stall;
+extern int sysctl_max_rcu_stall_to_panic;
extern int sysctl_panic_on_stackoverflow;

extern bool crash_kexec_post_notifiers;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 0fde39b..228c55f 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -13,6 +13,7 @@

/* panic() on RCU Stall sysctl. */
int sysctl_panic_on_rcu_stall __read_mostly;
+int sysctl_max_rcu_stall_to_panic __read_mostly;

#ifdef CONFIG_PROVE_RCU
#define RCU_STALL_DELAY_DELTA (5 * HZ)
@@ -106,6 +107,11 @@ early_initcall(check_cpu_stall_init);
/* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
static void panic_on_rcu_stall(void)
{
+ static int cpu_stall;
+
+ if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
+ return;
+
if (sysctl_panic_on_rcu_stall)
panic("RCU Stall\n");
}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 287862f..1bca490 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2651,6 +2651,17 @@ static struct ctl_table kern_table[] = {
.extra2 = SYSCTL_ONE,
},
#endif
+#if defined(CONFIG_TREE_RCU)
+ {
+ .procname = "max_rcu_stall_to_panic",
+ .data = &sysctl_max_rcu_stall_to_panic,
+ .maxlen = sizeof(sysctl_max_rcu_stall_to_panic),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ .extra2 = SYSCTL_INT_MAX,
+ },
+#endif
#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
{
.procname = "stack_erasing",