Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

From: Paul E. McKenney
Date: Tue Mar 01 2022 - 14:46:11 EST


On Tue, Mar 01, 2022 at 08:29:23PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 19:58, Paul E. McKenney wrote:
> > On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
> >> On 3/1/22 19:05, Paul E. McKenney wrote:
> >>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> >>>> fast machine, we start see HW noise where it does not exist, and that would
> >>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> >>>> need to make it as lightweight as possible.
> >>> In the common case, it is atomically incrementing a local per-CPU counter
> >>> and doing a store. This should be quite cheap.
> >>>
> >>> The uncommon case is when the osnoise process was preempted or otherwise
> >>> interfered with during a recent RCU read-side critical section and
> >>> preemption was disabled around that critical section's outermost
> >>> rcu_read_unlock(). This can be quite expensive. But I would expect
> >>> you to just not do this. ;-)
> >>
> >> Getting the expensive call after a preemption is not a problem, it is a side
> >> effect of the most costly preemption.
> >>
> >> It this case, we should "ping rcu" before reading the time to account the
> >> overhead for the previous preemption which caused it.
> >>
> >> like (using the current code as example):
> >>
> >> ------------------------- %< -------------------------------
> >> static u64
> >> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
> >> {
> >> u64 int_counter;
> >>
> >> do {
> >> int_counter = local_read(&osn_var->int_counter);
> >>
> >> ------------> HERE <-------------------------------------
> >>
> >> /* synchronize with interrupts */
> >> barrier();
> >>
> >> *time = time_get();
> >>
> >> /* synchronize with interrupts */
> >> barrier();
> >> } while (int_counter != local_read(&osn_var->int_counter));
> >>
> >> return int_counter;
> >> }
> >> ------------------------- >% -------------------------------
> >>
> >> In this way anything that happens before this *time is accounted before it is
> >> get. If anything happens while this loop is running, it will run again, so it is
> >> safe to point to the previous case.
> >>
> >> We would have to make a copy of this function, and only use the copy for the
> >> run_osnoise() case. A good name would be something in the lines of
> >> set_int_safe_time_rcu().
> >>
> >> (Unless the expensive is < than 1us.)
> >
> > The outermost rcu_read_unlock() could involve a call into the scheduler
> > to do an RCU priority deboost, which I would imagine could be a bit
> > expensive. But I would expect you to configure the test in such a way
> > that there was no need for RCU priority boosting. For example, by making
> > sure that the osnoise process's RCU readers were never preempted.
>
> So, the noise will not be seeing in the call that Nicolas is doing. but in the
> rcu_read_unlock() inside osnoise processes?
>
> If that is the case, then the "noise" would already be accounted to the
> previously preempted thread... and we should be fine then.

It could be either at the rcu_read_unlock() itself, or, if preemption
was disabled across that rcu_read_unlock(), at a subsequent point where
preemption is enabled. Which might amount to the same thing given that
there won't be any preemption until preemption is enabled?

> > Just out of curiosity, why is this running in the kernel rather than in
> > userspace? To focus only on kernel-centric noise sources? Or are there
> > people implementing real-time applications within the kernel?
>
> It is in kernel because it allows me to sync the workload and the trace, getting
> more (and more precise) information.
>
> For example, I can read the "noise in time" and how many interrupts happened in
> between two reads of the time, so I can look back in the trace to figure out
> which sources of noise were the cause of the noise I am seeing - without false
> positives. If no "interference" happened, I can safely say that it was a
> hardware noise (this saves us time in the debug, no need to run hwlat - I run
> two tools in one).
>
> This all with a more cheap access to the data. I also use such information to
> parse trace in kernel in a cheaper way, printing less info to the trace buffer.

Fair enough!

> But the idea is to see the noise for an user-space application as much as
> possible (and no, I am not doing application in kernel... but I know people
> doing it using a unikernel, but that is another story... a longer one... :-)).

There have been people writing their applications in Linux kernel modules,
or at least attempting to do so! ;-)

Thanx, Paul