Re: [Patch v4 6/6] sched: thermal: Enable tuning of decay period

From: Thara Gopinath
Date: Tue Nov 05 2019 - 15:26:11 EST


On 11/04/2019 11:12 AM, Ionela Voinescu wrote:
> Hi Thara,
>
> On Tuesday 22 Oct 2019 at 16:34:25 (-0400), Thara Gopinath wrote:
>> Thermal pressure follows pelt signas which means the
>> decay period for thermal pressure is the default pelt
>> decay period. Depending on soc charecteristics and thermal
>> activity, it might be beneficial to decay thermal pressure
>> slower, but still in-tune with the pelt signals.
>
> I wonder if it can be beneficial to decay thermal pressure faster as
> well.
>
> This implementation makes 32 (LOAD_AVG_PERIOD) the minimum half-life
> of the thermal pressure samples. This results in more than 100ms for a
> sample to decay significantly and therefore let's say it can take more
> than 100ms for capacity to return to (close to) max when the CPU is no
> longer capped. This value seems high to me considering that a minimum
> value should result in close to 'instantaneous' behaviour, when there
> are thermal capping mechanisms that can react in ~20ms (hikey960 has a
> polling delay of 25ms, if I'm remembering correctly).
>
> I agree 32ms seems like a good default but given that you've made this
> configurable as to give users options, I'm wondering if it would be
> better to cover a wider range.
>
>> One way to achieve this is to provide a command line parameter
>> to set the decay coefficient to an integer between 0 and 10.
>>
>> Signed-off-by: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
>> ---
>> v3->v4:
>> - Removed the sysctl setting to tune decay period and instead
>> introduced a command line parameter to control it. The rationale
>> here being changing decay period of a PELT signal runtime can
>> result in a skewed average value for atleast some cycles.
>>
>> Documentation/admin-guide/kernel-parameters.txt | 5 +++++
>> kernel/sched/thermal.c | 25 ++++++++++++++++++++++++-
>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index a84a83f..61d7baa 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4273,6 +4273,11 @@
>> incurs a small amount of overhead in the scheduler
>> but is useful for debugging and performance tuning.
>>
>> + sched_thermal_decay_coeff=
>> + [KNL, SMP] Set decay coefficient for thermal pressure signal.
>> + Format: integer betweer 0 and 10
>> + Default is 0.
>> +
>> skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
>> xtime_lock contention on larger systems, and/or RCU lock
>> contention on all systems with CONFIG_MAXSMP set.
>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
>> index 0c84960..0da31e1 100644
>> --- a/kernel/sched/thermal.c
>> +++ b/kernel/sched/thermal.c
>> @@ -10,6 +10,28 @@
>> #include "pelt.h"
>> #include "thermal.h"
>>
>> +/**
>> + * By default the decay is the default pelt decay period.
>> + * The decay coefficient can change is decay period in
>> + * multiples of 32.
>
> This description has to be corrected as well, as per Peter's comment.
>
> Also, it might be good not to use the value 32 directly but to mention
> that the decay period is a shift of LOAD_AVG_PERIOD. If that changes,
> the translation from decay shift to decay period below will change as
> well.

Hi Ionela,

I sent out the v5 without fixing this. Even if there are no other
comments on v5 I will send out a v6 fixing this.

Regarding a slower decay, we need a strong case for it.



--
Warm Regards
Thara