Re: [PATCH v2 0/3] newidle_balance() PREEMPT_RT latency mitigations

From: Mike Galbraith
Date: Tue May 04 2021 - 00:08:53 EST


On Mon, 2021-05-03 at 16:57 -0500, Scott Wood wrote:
> On Mon, 2021-05-03 at 20:52 +0200, Mike Galbraith wrote:
> > On Mon, 2021-05-03 at 11:33 -0500, Scott Wood wrote:
> > > On Sun, 2021-05-02 at 05:25 +0200, Mike Galbraith wrote:
> > > > If NEWIDLE balancing migrates one task, how does that manage to
> > > > consume
> > > > a full *millisecond*, and why would that only be a problem for RT?
> > > >
> > > > -Mike
> > > >
> > > > (rt tasks don't play !rt balancer here, if CPU goes idle, tough titty)
> > >
> > > Determining which task to pull is apparently taking that long (again,
> > > this is on a 128-cpu system). RT is singled out because that is the
> > > config that makes significant tradeoffs to keep latencies down (I
> > > expect this would be far from the only possible 1ms+ latency on a
> > > non-RT kernel), and there was concern about the overhead of a double
> > > context switch when pulling a task to a newidle cpu.
> >
> > What I think has be going on is that you're running a synchronized RT
> > load, many CPUs go idle as a thundering herd, and meet at focal point
> > busiest. What I was alluding to was that preventing such size scale
> > pile-ups would be way better than poking holes in it for RT to try to
> > sneak through. If pile-up it is, while not particularly likely, the
> > same should happen with normal tasks, wasting cycles generating heat.
> >
> > The main issue I see with these patches is that the resulting number is
> > still so gawd awful as to mean "nope, not rt ready", making the whole
> > exercise look a bit like a noop.
>
> It doesn't look like rteval asks cyclictest to synchronize, but
> regardless, how is this "poking holes"?

Pulling a single task is taking _a full millisecond_, which I see as a
mountain of cycles, directly through which you open a path for wakeups.
That "poking holes" isn't meant to be some kind of crude derogatory
remark, it's just the way I see what was done. The mountain still
stands, you didn't remove it.

> Making sure interrupts are
> enabled during potentially long-running activities is pretty fundamental
> to PREEMPT_RT. What specifically is your suggestion?

Try to include fair class in any LB improvement if at all possible,
because that's where most of the real world benefit is to be had.

> And yes, 317 us is still not a very good number for PREEMPT_RT, but
> progress is progress. It's hard to address the moderate latency spikes
> if they're obscured by large latency spikes. One also needs to have
> realistic expectations when it comes to RT on large systems, particularly
> when not isolating the latency-sensitive CPUs.

Agreed. But. Specifically because the result remains intolerable to
anything remotely sensitive, users running such on their big boxen are
not going to be doing mixed load, that flat does not work, which is why
I said the patch set looks a bit like a noop: it excludes the audience
that stands to gain.. nearly anything. Big box HPC (acronym includes
RT) gains absolutely nothing, as does big box general case with its not
particularly prevalent, but definitely existent RT tasks. Big box
users who are NOT running anything they care deeply about do receive
some love.. but don't care deeply, and certainly don't care more any
more deeply than general case users do about these collision induced
latency spikes.

-Mike