Re: [RFC PATCH v2 0/8] Introduct cpu idle prediction functionality

From: Rafael J. Wysocki
Date: Mon Oct 16 2017 - 20:17:00 EST


On Monday, October 16, 2017 9:44:41 AM CEST Li, Aubrey wrote:
> On 2017/10/14 9:14, Rafael J. Wysocki wrote:
> > On Saturday, September 30, 2017 9:20:26 AM CEST Aubrey Li wrote:
> >> We found under some latency intensive workloads, short idle periods occurs
> >> very common, then idle entry and exit path starts to dominate, so it's
> >> important to optimize them. To determine the short idle pattern, we need
> >> to figure out how long of the coming idle and the threshold of the short
> >> idle interval.
> >>
> >> A cpu idle prediction functionality is introduced in this proposal to catch
> >> the short idle pattern.
> >>
> >> Firstly, we check the IRQ timings subsystem, if there is an event
> >> coming soon.
> >> -- https://lwn.net/Articles/691297/
> >>
> >> Secondly, we check the idle statistics of scheduler, if it's likely we'll
> >> go into a short idle.
> >> -- https://patchwork.kernel.org/patch/2839221/
> >>
> >> Thirdly, we predict the next idle interval by using the prediction
> >> fucntionality in the idle governor if it has.
> >>
> >> For the threshold of the short idle interval, we record the timestamps of
> >> the idle entry, and multiply by a tunable parameter at here:
> >> -- /proc/sys/kernel/fast_idle_ratio
> >>
> >> We use the output of the idle prediction to skip turning tick off if a
> >> short idle is determined in this proposal. Reprogramming hardware timer
> >> twice(off and on) is expensive for a very short idle. There are some
> >> potential optimizations can be done according to the same indicator.
> >>
> >> I observed when system is idle, the idle predictor reports 20/s long idle
> >> and ZERO fast idle on one CPU. And when the workload is running, the idle
> >> predictor reports 72899/s fast idle and ZERO long idle on the same CPU.
> >>
> >> Aubrey Li (8):
> >> cpuidle: menu: extract prediction functionality
> >> cpuidle: record the overhead of idle entry
> >> cpuidle: add a new predict interface
> >> tick/nohz: keep tick on for a fast idle
> >> timers: keep sleep length updated as needed
> >> cpuidle: make fast idle threshold tunable
> >> cpuidle: introduce irq timing to make idle prediction
> >> cpuidle: introduce run queue average idle to make idle prediction
> >>
> >> drivers/cpuidle/Kconfig | 1 +
> >> drivers/cpuidle/cpuidle.c | 109 +++++++++++++++++++++++++++++++++++++++
> >> drivers/cpuidle/governors/menu.c | 69 ++++++++++++++++---------
> >> include/linux/cpuidle.h | 21 ++++++++
> >> kernel/sched/idle.c | 14 ++++-
> >> kernel/sysctl.c | 12 +++++
> >> kernel/time/tick-sched.c | 7 +++
> >> 7 files changed, 209 insertions(+), 24 deletions(-)
> >>
> >
> > Overall, it looks like you could avoid stopping the tick every time the
> > predicted idle duration is not longer than the tick interval in the first
> > place.
> > > Why don't you do that?
>
> I didn't catch this.
>
> Are you suggesting?
>
> if(!cpu_stat.fast_idle)
> tick_nohz_idle_enter()
>
> Or you concern why the threshold can't simply be tick interval?

That I guess.

> For the first, can_stop_idle_tick() is a better place to skip tick-off IMHO.
> For the latter, if the threshold is close/equal to the tick, it's quite possible
> the next event is the tick and no other else event.

Well, I don't quite get that.

What's the reasoning here?

Thanks,
Rafael