Re: [PATCH v4] sched/fair: unlink misfit task from cpu overutilized

From: Vincent Guittot
Date: Wed Jan 25 2023 - 03:37:18 EST


On Wed, 25 Jan 2023 at 02:46, Kajetan Puchalski
<kajetan.puchalski@xxxxxxx> wrote:
>
> Hi,
>
> > By taking into account uclamp_min, the 1:1 relation between task misfit
> > and cpu overutilized is no more true as a task with a small util_avg may
> > not fit a high capacity cpu because of uclamp_min constraint.
> >
> > Add a new state in util_fits_cpu() to reflect the case that task would fit
> > a CPU except for the uclamp_min hint which is a performance requirement.
> >
> > Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> > can use this new value to take additional action to select the best CPU
> > that doesn't match uclamp_min hint.
> >
>
> Just wanted to include some more test data here to flag potential issues
> with how all these changes use thermal pressure in the scheduler.
>
> For the tables below, 'baseline' is pre the already merged "uclamp fits
> capacity" patchset.
> 'baseline_ufc' is the current mainline with said patchset applied.
> The 'no_thermal' variants are variants with thermal handling taken out
> of util_fits_cpu akin to the v1 variant of the patchset.
> The 'patched' variants are the above but with the v4 of the 'unlink misfit
> task' patch applied as well.
>
> Geekbench 5:
>
> +-----------------+-------------------------+--------+-----------+
> | metric | kernel | value | perc_diff |
> +-----------------+-------------------------+--------+-----------+
> | multicore_score | baseline | 2765.4 | 0.0% |
> | multicore_score | baseline_ufc | 2704.3 | -2.21% | <-- mainline score regression
> | multicore_score | baseline_ufc_no_thermal | 2870.8 | 3.81% | <-- ~170 pts better without thermal
> | multicore_score | ufc_patched_v4 | 2839.8 | 2.69% | <-- no more regression but worse than above
> | multicore_score | ufc_patched_no_thermal | 2891.0 | 4.54% | <-- best score
> +-----------------+-------------------------+--------+-----------+
>
> +--------------+--------+-------------------------+--------+-----------+
> | chan_name | metric | kernel | value | perc_diff |
> +--------------+--------+-------------------------+--------+-----------+
> | total_power | gmean | baseline | 2664.0 | 0.0% |
> | total_power | gmean | baseline_ufc | 2621.5 | -1.6% |
> | total_power | gmean | baseline_ufc_no_thermal | 2710.0 | 1.73% |
> | total_power | gmean | ufc_patched_v4 | 2729.0 | 2.44% |
> | total_power | gmean | ufc_patched_no_thermal | 2778.1 | 4.28% |
> +--------------+--------+-------------------------+--------+-----------+
>
> (Jankbench scores show more or less no change, Jankbench power below)
>
> +--------------+--------+------------------------------+-------+-----------+
> | chan_name | metric | kernel | value | perc_diff |
> +--------------+--------+------------------------------+-------+-----------+
> | total_power | gmean | baseline_60hz | 135.9 | 0.0% |
> | total_power | gmean | baseline_ufc_60hz | 155.7 | 14.61% | <-- mainline power usage regression
> | total_power | gmean | baseline_ufc_no_thermal_60hz | 134.5 | -1.01% | <-- no more regression without the thermal bit
> | total_power | gmean | ufc_patched_v4_60hz | 131.4 | -3.26% | <-- no more regression with the patch either
> | total_power | gmean | ufc_patched_no_thermal_60hz | 140.4 | 3.37% | <-- both combined increase power usage
> +--------------+--------+------------------------------+-------+-----------+
>
> Specifically the swing of +15% power to -1% power by taking out thermal
> handling from util_fits_cpu on the original patchset is noteworthy. It
> shows that there's some subtle thermal-related interaction there taking
> place that can have adverse effects on power usage.
>
> Even more interestingly, the 'unlink misfit task' patch appears to be
> preventing said interaction from happening down the line as it has a
> similar effect to that of just taking out the thermal bits.
>
> My only concern here is that without taking a closer look at the thermal
> parts this power issue shown above could easily accidentally be
> reintroduced at some point in the future.

Yes, the handling of thermal pressure needs some closer look. It has
been agreed to keep the current behavior for now and have a closer
look on thermal pressure as a next step. And your results for
ufc_patched_v4_* provide some reasonably good perf and power figures.

Thanks

>
> > --
> > 2.34.1
> >