Re: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems

From: Rafael J. Wysocki
Date: Thu Oct 10 2019 - 04:43:16 EST


On Thu, Oct 10, 2019 at 9:05 AM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>
> On 2019.10.09 06:37 Rafael J. Wysocki wrote:
> > On Wednesday, October 9, 2019 1:19:51 AM CEST Rafael J. Wysocki wrote:
> >> On Tuesday, October 8, 2019 12:49:01 PM CEST Rafael J. Wysocki wrote:
> >>> On Tue, Oct 8, 2019 at 11:51 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >>>> On Tue, Oct 8, 2019 at 8:20 AM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >>>>> O.K. Thanks for your quick reply, and insight.
> >>>>>
> >>>>> I think long durations always need to be counted, but currently if
> >>>>> the deepest idle state is disabled, they are not.
> ...
> >>>> AFAICS, adding early_hits to count is not a mistake if there are still
> >>>> enabled states deeper than the current one.
> >>>
> >>> And the mistake appears to be that the "hits" and "misses" metrics
> >>> aren't handled in analogy with the "early_hits" one when the current
> >>> state is disabled.
>
> I only know how to exploit and test the "hits" and "misses" path
> that should use the deepest available idle state upon transition
> to an idle system. Even so, the test has a low probability of
> failing, and so needs to be run many times.
>
> I do not know how to demonstrate and/or test any "early_hits" path
> to confirm that an issue exists or that it is fixed.
>
> >>>
> >>> Let me try to cut a patch to address that.
> >>
> >> Appended below, not tested.
>
> Reference as: rjw1
>
> >>
> >> It is meant to address two problems, one of which is that the "hits" and
> >> "misses" metrics of disabled states need to be taken into account too in
> >> some cases, and the other is an issue with the handling of "early hits"
> >> which may lead to suboptimal state selection if some states are disabled.
> >
> > Well, it still misses a couple of points.
> >
> > First, disable states that are too deep should not be taken into consideration
> > at all.
> >
> > Second, the "hits" and "misses" metrics of disabled states need to be used for
> > idle duration ranges corresponding to them regardless of whether or not the
> > "hits" value is greater than the "misses" one.
> >
> > Updated patch is below (still not tested), but it tries to do too much in one
> > go, so I need to split it into a series of smaller changes.
>
> Thanks for your continued look at this.
>
> Reference as: rjw2
>
> Test 1, hack job statistical test (old tests re-stated):
>
> Kernel tests fail rate
> 5.4-rc1 6616 13.45%
> 5.3 2376 4.50%
> 5.3-teov7 12136 0.00% <<< teo.c reverted and teov7 put in its place.
> 5.4-rc1-ds 11168 0.00% <<< [old] ds proposed patch (> 7 hours test time)
> 5.4-rc1-ds12 4224 0.00% <<< [old] new ds proposed patch
> 5.4-rc2-rjw1 11280 0.00%
> 5.4-rc2-rjw2 640 0.00% <<< Will be run again, for longer.
>
> Test 2: I also looked at every possible enable/disable idle combination,
> and they all seemed O.K.
>
> No other tests have been run yet.
>
> System:
> Processor: i7-2600K
> Deepest idle state: 4 (C6)

Thanks a lot for sharing the results!

Cheers,
Rafael