Re: [PATCH] sched: idle: Avoid retaining the tick when it has been stopped

From: Rafael J. Wysocki
Date: Sun Aug 19 2018 - 03:57:37 EST


On Sun, Aug 19, 2018 at 2:36 AM <leo.yan@xxxxxxxxxx> wrote:
>
> On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
> > > > > Otherwise we can have something like this:
> > > > >
> > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > > > index da9455a..408c985 100644
> > > > > --- a/kernel/time/tick-sched.c
> > > > > +++ b/kernel/time/tick-sched.c
> > > > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> > > > > static void tick_nohz_retain_tick(struct tick_sched *ts)
> > > > > {
> > > > > ts->timer_expires_base = 0;
> > > > > +
> > > > > + if (ts->tick_stopped)
> > > > > + tick_nohz_restart(ts, ktime_get());
> > > > > }
> > > > >
> > > > > #ifdef CONFIG_NO_HZ_FULL
> > > > >
> > > >
> > > > We could do that, but my concern with that approach is that we may end up
> > > > stopping and starting the tick back and forth without exiting the loop
> > > > in do_idle() just because somebody uses a periodic timer behind our
> > > > back and the governor gets confused.
> > > >
> > > > Besides, that would be a change in behavior, while the $subject patch
> > > > simply fixes a mistake in the original design.
> > >
> > > Ok, let's take the safe approach for now as this is a fix and it should even be
> > > routed to stable.
> >
> > Right. I'll queue up this patch, then.
> >
> > > But then in the longer term, perhaps cpuidle_select() should think that
> > > through.
> >
> > So I have given more consideration to this and my conclusion is that
> > restarting the tick between cpuidle_select() and call_cpuidle() is a
> > bad idea.
> >
> > First off, if need_resched() is "false", the primary reason for
> > running the tick on the given CPU is not there, so it only might be
> > useful as a "backup" timer to wake up the CPU from an inadequate idle
> > state.
> >
> > Now, in general, there are two reasons for the idle governor (whatever
> > it is) to select an idle state with a target residency below the tick
> > period length. The first reason is when the governor knows that the
> > closest timer event is going to occur in this time frame, but in that
> > case (as stated above), it is not necessary to worry about the tick,
> > because the other timer will trigger soon enough anyway. The second
> > reason is when the governor predicts a wakeup which is not by a timer
> > in this time frame and it is quite arguable what the governor should
> > do then. IMO it at least is not unreasonable to throw the prediction
> > away and still go for the closest timer event in that case (which is
> > the current approach).
> >
> > There's more, though. Restarting the tick between cpuidle_select()
> > and call_cpuidle() might introduce quite a bit of latency into that
> > point and that would mess up with the idle state selection (e.g.
> > selecting a very shallow idle state might not make a lot of sense if
> > that latency was high enough, because the expected wakeup might very
> > well take place when the tick was being restarted), so it should
> > rather be avoided IMO.
>
> I expect the idle governor doesn't introduce many restarting tick
> operations, the reason is if there have a close timer event than idle
> governor can trust it to wake up CPU so in this case the idle governor
> will not restart tick; if the the timer event is long delta and the
> shallow state selection is caused by factors (e.g. typical pattern),
> then we need restart tick to avoid powernightmares, for this case we
> can restart tick only once at the beginning for the typical pattern
> interrupt events; after the typical pattern interrupt doesn't continue
> then we can rely on the tick to rescue the idle state to deep one.

No, we don't need to restart the tick at all. We just need to require
the governor to disregard "typical patterns" (which are not
timer-induced, mind you) when it knows that the tick has been stopped
already.

Unfortunately, the menu governor cannot distinguish a timer-induced
"typical" pattern from one related to device interrupts, but I don't
really see a reason to worry about the latter when the CPU is idle and
with stopped tick (which means that the workload can tolerate extra
latency from deep idle states anyway).