Re: cpufreq: intel_pstate: map utilization into the pstate range

From: Julia Lawall
Date: Thu Dec 30 2021 - 12:54:05 EST


> > The effect is the same. But that approach is indeed simpler than patching
> > the kernel.
>
> It is also applicable when intel_pstate runs in the active mode.
>
> As for the results that you have reported, it looks like the package
> power on these systems is dominated by package voltage and going from
> P-state 20 to P-state 21 causes that voltage to increase significantly
> (the observed RAM energy usage pattern is consistent with that). This
> means that running at P-states above 20 is only really justified if
> there is a strict performance requirement that can't be met otherwise.
>
> Can you please check what value is there in the base_frequency sysfs
> attribute under cpuX/cpufreq/?

2100000, which should be pstate 21

>
> I'm guessing that the package voltage level for P-states 10 and 20 is
> the same, so the power difference between them is not significant
> relative to the difference between P-state 20 and 21 and if increasing
> the P-state causes some extra idle time to appear in the workload
> (even though there is not enough of it to prevent to overall
> utilization from increasing), then the overall power draw when running
> at P-state 10 may be greater that for P-state 20.

My impression is that the package voltage level for P-states 10 to 20 is
high enough that increasing the frequency has little impact. But the code
runs twice as fast, which reduces the execution time a lot, saving energy.

My first experiment had only one running thread. I also tried running 32
spinning threads for 10 seconds, ie using up one package and leaving the
other idle. In this case, instead of staying around 600J for pstates
10-20, the pstate rises from 743 to 946. But there is still a gap between
20 and 21, with 21 being 1392J.

> You can check if there is any C-state residency difference between
> these two cases by running the workload under turbostat in each of
> them.

The C1 and C6 cases (CPU%c1 and CPU%c6) are about the same between 20 and
21, whether with 1 thread or with 32 thread.

> Anyway, this is a configuration in which the HWP scaling algorithm
> used when intel_pstate runs in the active mode is likely to work
> better, because it should take the processor design into account.
> That's why it is the default configuration of intel_pstate on systems
> with HWP. There are cases in which schedutil helps, but that's mostly
> when HWP without it tends to run the workload too fast, because it
> lacks the utilization history provided by PELT.

OK, I'll look into that case a bit more.

thanks,
julia