Re: [RFC PATCH v3 2/2] cpuidle: teo: Introduce util-awareness

From: Doug Smythies
Date: Tue Nov 01 2022 - 02:24:22 EST


Hi Kajetan,

On Mon, Oct 31, 2022 at 5:14 AM Kajetan Puchalski
<kajetan.puchalski@xxxxxxx> wrote:

... [delete some]...

> /**
> * teo_update - Update CPU metrics after wakeup.
> * @drv: cpuidle driver containing state data.
> @@ -303,7 +359,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> int i;
>
> if (dev->last_state_idx >= 0) {
> - teo_update(drv, dev);
> + /* don't update metrics if the cpu was utilized during the last sleep */
> + if (!cpu_data->utilized)
> + teo_update(drv, dev);
> dev->last_state_idx = -1;
> }

Ignoring the metrics is not the correct thing to do.
Depending on the workflow, it can severely bias the idle states deeper
than they should be because most of the needed information to select
the appropriate shallow state is tossed out.

Example 1:
2 pairs of ping pongs = 4 threads
Parameters chosen such that idle state 2 would be a most used state.
CPU frequency governor: Schedutil.
CPU frequency scaling driver: intel_cpufreq.
HWP: Disabled
Processor: i5-10600K (6 cores 12 cpus).
Kernel: 6.1-rc3
Run length: 1e8 cycles
Idle governor:
teo: 11.73 uSecs/loop ; idle state 1 ~3.5e6 exits/sec
menu: 12.1 uSecs/loop ; idle state 1 ~3.3e6 exits/sec
util-v3: 15.2 uSecs/loop ; idle state 1 ~200 exits/sec
util-v4: 11.63 uSecs/loop ; idle state 1 ~3.5e6 exits/sec

Where util-v4 is the same as this patch (util-v3) with the above code reverted.

Note: less time per loop is better.

Example 2: Same but parameters selected such that idle state 0 would
be a most used idle state.
Run Length: 4e8 cycles
Idle governor:
teo: 3.1 uSecs/loop ; idle state 0 ~1.2e6 exits/sec
menu: 3.1 uSecs/loop ; idle state 0 ~1.3e6 exits/sec
util-v3: 5.1 uSecs/loop ; idle state 0 ~4 exits/sec
util-v4: ? uSecs/loop ; idle state 0 ~1.2e6 exits/sec (partial result)

Note: the util-v4 test is still in progress, but it is late in my time
zone. But I can tell from the idle state usage, which I can observe
once per minute, that the issue is, at least mostly, fixed.

... Doug