Re: [PATCH v4] sched/fair: unlink misfit task from cpu overutilized

From: Kajetan Puchalski
Date: Tue Jan 24 2023 - 20:46:24 EST


Hi,

> By taking into account uclamp_min, the 1:1 relation between task misfit
> and cpu overutilized is no more true as a task with a small util_avg may
> not fit a high capacity cpu because of uclamp_min constraint.
>
> Add a new state in util_fits_cpu() to reflect the case that task would fit
> a CPU except for the uclamp_min hint which is a performance requirement.
>
> Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> can use this new value to take additional action to select the best CPU
> that doesn't match uclamp_min hint.
>

Just wanted to include some more test data here to flag potential issues
with how all these changes use thermal pressure in the scheduler.

For the tables below, 'baseline' is pre the already merged "uclamp fits
capacity" patchset.
'baseline_ufc' is the current mainline with said patchset applied.
The 'no_thermal' variants are variants with thermal handling taken out
of util_fits_cpu akin to the v1 variant of the patchset.
The 'patched' variants are the above but with the v4 of the 'unlink misfit
task' patch applied as well.

Geekbench 5:

+-----------------+-------------------------+--------+-----------+
| metric | kernel | value | perc_diff |
+-----------------+-------------------------+--------+-----------+
| multicore_score | baseline | 2765.4 | 0.0% |
| multicore_score | baseline_ufc | 2704.3 | -2.21% | <-- mainline score regression
| multicore_score | baseline_ufc_no_thermal | 2870.8 | 3.81% | <-- ~170 pts better without thermal
| multicore_score | ufc_patched_v4 | 2839.8 | 2.69% | <-- no more regression but worse than above
| multicore_score | ufc_patched_no_thermal | 2891.0 | 4.54% | <-- best score
+-----------------+-------------------------+--------+-----------+

+--------------+--------+-------------------------+--------+-----------+
| chan_name | metric | kernel | value | perc_diff |
+--------------+--------+-------------------------+--------+-----------+
| total_power | gmean | baseline | 2664.0 | 0.0% |
| total_power | gmean | baseline_ufc | 2621.5 | -1.6% |
| total_power | gmean | baseline_ufc_no_thermal | 2710.0 | 1.73% |
| total_power | gmean | ufc_patched_v4 | 2729.0 | 2.44% |
| total_power | gmean | ufc_patched_no_thermal | 2778.1 | 4.28% |
+--------------+--------+-------------------------+--------+-----------+

(Jankbench scores show more or less no change, Jankbench power below)

+--------------+--------+------------------------------+-------+-----------+
| chan_name | metric | kernel | value | perc_diff |
+--------------+--------+------------------------------+-------+-----------+
| total_power | gmean | baseline_60hz | 135.9 | 0.0% |
| total_power | gmean | baseline_ufc_60hz | 155.7 | 14.61% | <-- mainline power usage regression
| total_power | gmean | baseline_ufc_no_thermal_60hz | 134.5 | -1.01% | <-- no more regression without the thermal bit
| total_power | gmean | ufc_patched_v4_60hz | 131.4 | -3.26% | <-- no more regression with the patch either
| total_power | gmean | ufc_patched_no_thermal_60hz | 140.4 | 3.37% | <-- both combined increase power usage
+--------------+--------+------------------------------+-------+-----------+

Specifically the swing of +15% power to -1% power by taking out thermal
handling from util_fits_cpu on the original patchset is noteworthy. It
shows that there's some subtle thermal-related interaction there taking
place that can have adverse effects on power usage.

Even more interestingly, the 'unlink misfit task' patch appears to be
preventing said interaction from happening down the line as it has a
similar effect to that of just taking out the thermal bits.

My only concern here is that without taking a closer look at the thermal
parts this power issue shown above could easily accidentally be
reintroduced at some point in the future.

> --
> 2.34.1
>