Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

From: Nicolas Saenz Julienne
Date: Wed Mar 02 2022 - 05:46:19 EST


On Tue, 2022-03-01 at 09:56 -0800, Paul E. McKenney wrote:
> On Tue, Mar 01, 2022 at 11:00:08AM +0100, Nicolas Saenz Julienne wrote:
> > On Mon, 2022-02-28 at 14:11 -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> > > > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > > > kernel might have the side effect of extending grace periods too much.
> > > > This will eventually entice RCU to schedule a task on the isolated CPU
> > > > to end the overly extended grace period, adding unwarranted noise to the
> > > > CPU being traced in the process.
>
> Ah, I misread the above paragraph. Apologies!
>
> Nevertheless, could you please add something explicit to the effect that
> RCU is completing grace periods as required?

Yes, of course.

[...]
> > > o At about 30 milliseconds into the grace period, RCU forces an
> > > explicit context switch on the wayward CPU. This should get
> > > the CPU's attention even in CONFIG_PREEMPT=y kernels.
> > >
> > > So what is happening for you instead?
> >
> > Well, that's exactly what I'm seeing, but it doesn't play well with osnoise.
>
> Whew!!! ;-)
>
> > Here's a simplified view of what the tracer does:
> >
> > time1 = get_time();
> > while(1) {
> > time2 = get_time();
> > if (time2 - time1 > threshold)
> > trace_noise();
> > cond_resched();
> > time1 = time2;
> > }
> >
> > This is pinned to a specific CPU, and in the most extreme cases is expected to
> > take 100% of CPU time. Eventually, some SMI, NMI/interrupt, or process
> > execution will trigger the threshold, and osnoise will provide some nice traces
> > explaining what happened.
> >
> > RCU forcing a context switch on the wayward CPU is introducing unwarranted
> > noise as it's triggered by the fact we're measuring and wouldn't happen
> > otherwise.
> >
> > If this were user-space, we'd be in an EQS, which would make this problem go
> > away. An option would be mimicking this behaviour (assuming irq entry/exit code
> > did the right thing):
> >
> > rcu_eqs_enter(); <--
> > time1 = get_time();
> > while(1) {
> > time2 = get_time();
> > if (time2 - time1 > threshold)
> > trace_noise();
> > rcu_eqs_exit(); <--
> > cond_resched();
> > rcu_eqs_enter(); <--
> > time1 = time2;
> > }
> >
> > But given the tight loop this isn't much different than what I'm proposing at
> > the moment, isn't it? rcu_momentary_dyntick_idle() just emulates a really fast
> > EQS entry/exit.
>
> And that is in fact exactly what rcu_momentary_dyntick_idle() was
> intended for:
>
> Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

Thanks!

--
Nicolás Sáenz