Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

From: Dietmar Eggemann
Date: Wed Feb 22 2023 - 15:14:07 EST


On 20/02/2023 14:54, Vincent Guittot wrote:
> On Fri, 17 Feb 2023 at 14:54, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 09/02/2023 17:16, Vincent Guittot wrote:
>>> On Tue, 7 Feb 2023 at 11:29, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>>
>>>> On 09/11/2022 16:49, Peter Zijlstra wrote:
>>>>> On Tue, Nov 08, 2022 at 07:48:43PM +0000, Qais Yousef wrote:
>>>>>> On 11/07/22 14:41, Peter Zijlstra wrote:
>>>>>>> On Thu, Sep 29, 2022 at 03:41:47PM +0100, Kajetan Puchalski wrote:

[...]

>>> Graphics Pipeline short task, hasn't uclamp_min been designed for and
>>> a better solution ?
>>
>> Yes, it has. I'm not sure how feasible this is to do for all tasks
>> involved. I'm thinking about the Binder threads here for instance.
>
> Yes, that can probably not help for all threads but some system
> threads like surfaceflinger and graphic composer should probably
> benefit from min uclamp

Yes, and it looks like that the Android version I'm using
SQ1D.220205.004 (Feb '22) (automatic system updates turned off) is
already using uclamp_min != 0 for tasks like UI thread. It's not one
particular value but different values from [0 .. 512] over the runtime
of a Jankbench iteration. I have to have a closer look.

[...]

>> Max_frame_duration:
>> +------------------------------------------+------------+
>> | kernel | value |
>> +------------------------------------------+------------+
>> | base-a30b17f016b0 | 147.571352 |
>> | pelt-hl-m2 | 119.416351 |
>> | pelt-hl-m4 | 96.473412 |
>> | scaled_util_est_faster_freq | 126.646506 |
>> | max_util_scaled_util_est_faster_rbl_freq | 157.974501 | <-- !!!
>> +------------------------------------------+------------+
>>
>> Mean_frame_duration:
>> +------------------------------------------+-------+-----------+
>> | kernel | value | perc_diff |
>> +------------------------------------------+-------+-----------+
>> | base-a30b17f016b0 | 14.7 | 0.0% |
>> | pelt-hl-m2 | 13.6 | -7.5% |
>> | pelt-hl-m4 | 13.0 | -11.68% |
>> | scaled_util_est_faster_freq | 13.7 | -6.81% |
>> | max_util_scaled_util_est_faster_rbl_freq | 12.1 | -17.85% |
>> +------------------------------------------+-------+-----------+
>>
>> Jank percentage (Jank deadline 16ms):
>> +------------------------------------------+-------+-----------+
>> | kernel | value | perc_diff |
>> +------------------------------------------+-------+-----------+
>> | base-a30b17f016b0 | 1.8 | 0.0% |
>> | pelt-hl-m2 | 1.8 | -4.91% |
>> | pelt-hl-m4 | 1.2 | -36.61% |
>> | scaled_util_est_faster_freq | 1.3 | -27.63% |
>> | max_util_scaled_util_est_faster_rbl_freq | 0.8 | -54.86% |
>> +------------------------------------------+-------+-----------+
>>
>> Power usage [mW] (total - all CPUs):
>> +------------------------------------------+-------+-----------+
>> | kernel | value | perc_diff |
>> +------------------------------------------+-------+-----------+
>> | base-a30b17f016b0 | 144.4 | 0.0% |
>> | pelt-hl-m2 | 141.6 | -1.97% |
>> | pelt-hl-m4 | 163.2 | 12.99% |
>> | scaled_util_est_faster_freq | 132.3 | -8.41% |
>> | max_util_scaled_util_est_faster_rbl_freq | 133.4 | -7.67% |
>> +------------------------------------------+-------+-----------+
>>
>> There is a regression in `Max_frame_duration` but `Mean_frame_duration`,
>> `Jank percentage` and `Power usage` are better.
>
> The max frame duration is interesting. Could it be the very 1st frame
> of the test ?
> It's interesting that it's even worse than baseline whereas it should
> take the max of baseline and runnable_avg

Since you asked in the following email: I just used the boosting for CPU
frequency selection (from sugov_get_util()). I added the the `_freq`
suffix in the kernel name to indicate this.

I don't have any helpful `ftrace` or `perfetto` data for these test runs
though.

That's why I ran another iteration with perfetto on
`max_util_scaled_util_est_faster_rbl_freq`.

`Max frame duration` = 121ms (< 158ms but this was over 10 iterations)
happened at the beginning of the 3/8 `List View Fling` episode.

The UI thread (com.android.benchmark) runs on CPU1. Just before the
start of this episode the CPU freq is 0.3Ghz. It takes 43ms for the CPU
freq to go up to 1.1Ghz.

oriole:/sys # cat devices/system/cpu/cpu1/cpu_capacity

124

oriole:/sys # cat devices/system/cpu/cpu1/cpufreq
/scaling_available_frequencies

300000 574000 738000 930000 1098000 1197000 1328000 1401000 1598000
1704000 1803000

So the combination of little CPU and low CPU frequency is the reason
why. But I can't see how using `max(max(util_avg, util_est.enq),
rbl_avg) can make `max frame duration` worse?
Don't understand how asking for higher CPU frequencies in contention
favors the UI thread being scheduled on little CPUs at the beginning of
an episode?

Also the particular uclamp_min settings of the runnable tasks at this
moment can have an influence on this `max frame duration` value.

[...]