[PATCH v2] tracing/osnoise: Force quiescent states while tracing

From: Nicolas Saenz Julienne
Date: Mon Mar 07 2022 - 13:08:01 EST


At the moment running osnoise on a nohz_full CPU or uncontested FIFO
priority and a PREEMPT_RCU kernel might have the side effect of
extending grace periods too much. This will entice RCU to force a
context switch on the wayward CPU to end the grace period, all while
introducing unwarranted noise into the tracer. This behaviour is
unavoidable as overly extending grace periods might exhaust the system's
memory.

This same exact problem is what extended quiescent states (EQS) were
created for, conversely, rcu_momentary_dyntick_idle() emulates them by
performing a zero duration EQS. So let's make use of it.

In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
atomically incrementing a local per-CPU counter and doing a store. So it
shouldn't affect osnoise's measurements (which has a 1us granularity),
so we'll call it unanimously.

The uncommon case involve calling rcu_momentary_dyntick_idle() after
having the osnoise process:

- Receive an expedited quiescent state IPI with preemption disabled or
during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
code-path).

- Being preempted within in an RCU critical section and having the
subsequent outermost rcu_read_unlock() called with interrupts
disabled. (t->rcu_read_unlock_special.b.blocked code-path).

Neither of those are possible at the moment, and are unlikely to be in
the future given the osnoise's loop design. On top of this, the noise
generated by the situations described above is unavoidable, and if not
exposed by rcu_momentary_dyntick_idle() will be eventually seen in
subsequent rcu_read_unlock() calls or schedule operations.

Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx>
Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
---

Changes since v1:
- Use local_irq_{enable,disable}()
- Update commit message and comments to better explain RCU's behaviour
- Get rid of nohz_full and tick checks
- Comment on rcu_momentary_dyntick_idle()'s eventual execution cost

kernel/trace/trace_osnoise.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index a96d777350fa..ae5e314d7083 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1388,6 +1388,26 @@ static int run_osnoise(void)
osnoise_stop_tracing();
}

+ /*
+ * In some cases, notably when running on a nohz_full CPU with
+ * a stopped tick PREEMPT_RCU has no way to account for QSs.
+ * This will eventually cause unwarranted noise as PREEMPT_RCU
+ * will force preemption as the means of ending the current
+ * grace period. We avoid this problem by calling
+ * rcu_momentary_dyntick_idle(), which performs a zero duration
+ * EQS allowing PREEMPT_RCU to end the current grace period.
+ * This call shouldn't be wrapped inside an RCU critical
+ * section.
+ *
+ * Note that in non PREEMPT_RCU kernels QSs are handled through
+ * cond_resched()
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
+ local_irq_disable();
+ rcu_momentary_dyntick_idle();
+ local_irq_enable();
+ }
+
/*
* For the non-preemptive kernel config: let threads runs, if
* they so wish.
--
2.35.1