Re: [PATCH V3 3/3] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping

From: Thara Gopinath
Date: Fri Apr 26 2019 - 06:24:24 EST


On 04/24/2019 11:56 AM, Ionela Voinescu wrote:
> Hi guys,
>
> On 23/04/2019 23:38, Thara Gopinath wrote:
>> On 04/18/2019 05:48 AM, Quentin Perret wrote:
>>> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote:
>>>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>>>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>>>>
>>>> if (policy->max > clipped_freq)
>>>> cpufreq_verify_within_limits(policy, 0, clipped_freq);
>>>> +
>>>> + sched_update_thermal_pressure(policy->cpus,
>>>> + policy->max, policy->cpuinfo.max_freq);
>>>
>>> Is this something we could do this CPUFreq ? Directly in
>>> cpufreq_verify_within_limits() perhaps ?
>>>
>>> That would re-define the 'thermal pressure' framework in a more abstract
>>> way and make the scheduler look at 'frequency capping' events,
>>> regardless of the reason for capping.
>>>
>>> That would reflect user-defined frequency constraint into cpu_capacity,
>>> in addition to the thermal stuff. I'm not sure if there is another use
>>> case for frequency capping ?
>> Hi Quentin,
>> Thanks for the review. Sorry for the delay in response as I was on
>> vacation for the past few days.
>> I think there is one major difference between user-defined frequency
>> constraints and frequency constraints due to thermal events in terms of
>> the time period the system spends in the the constraint state.
>> Typically, a user constraint lasts for seconds if not minutes and I
>> think in this case cpu_capacity_orig should reflect this constraint and
>> not cpu_capacity like this patch set. Also, in case of the user
>> constraint, there is possibly no need to accumulate and average the
>> capacity constraints and instantaneous values can be directly applied to
>> cpu_capacity_orig. On the other hand thermal pressure is more spiky and
>> sometimes in the order of ms and us requiring the accumulating and
>> averaging.
>
> I think we can't make any assumptions in regards to the intentions of
> the user when restricting the OPP range though the cpufreq interface,
> but it would still be nice to do something and reflecting it as thermal
> pressure would be a good start. It might not be due to thermal, but it
> is a capacity restriction that would have the same result. Also, if the
> user has the ability to tune the decay period he has the control over
> the behavior of the signal. Given that currently there isn't a smarter
> mechanism (modifying capacity orig, re-normalising the capacity range)
> for long-term capping, even treating it as short-term capping is a good
> start. But this is a bigger exercise and it needs thorough
> consideration, so it could be skipped, in my opinion, for now..
>
> Also, if we want to stick with the "definition", userspace would still
> be able to reflect thermal pressure though the thermal limits interface
> by setting the cooling device state, which will be reflected in this
> update as well. So userspace would have a mechanism to reflect thermal
> pressure.

Yes, target_state under cooling devices can be set and this will reflect
as thermal pressure.

>
> One addition.. I like that the thermal pressure framework is not tied to
> cpufreq. There are firmware solutions that do not bother informing
> cpufreq of limits being changed, and therefore all of this could be
> skipped. But any firmware driver could call sched_update_thermal_pressure
> on notifications for limits changing from firmware, which is an
> important feature.

For me, I am open to discussion on the best place to call
sched_update_thermal_pressure from. Seeing the discussion and different
opinions, I am wondering should there be a SoC or platform specific hook
provided for better abstraction.

Regards
Thara

>
>>>
>>> Perhaps the Intel boost stuff could be factored in there ? That is,
>>> at times when the boost freq is not reachable capacity_of() would appear
>>> smaller ... Unless this wants to be reflected instantaneously ?
>> Again, do you think intel boost is more applicable to be reflected in
>> cpu_capacity_orig and not cpu_capacity?
>>>
>>> Thoughts ?
>>> Quentin
>>>
>>
>
> The changes here would happen even faster than thermal capping, same as
> other restrictions imposed by firmware, so it would not seem right to me
> to reflect it in capacity_orig. Reflecting it as thermal pressure is
> another matter, which I'd say it should be up to the client. The big
> disadvantage I'd see for this is coping with decisions made while being
> capped, when you're not capped any longer, and the other way around. I
> believe these changes would happen too often and they will not happen in
> a ramp-up/ramp-down behavior that we expect from thermal mitigation.
> That's why I believe averaging/regulation of the signal works well in
> this case, and it might not for power related fast restrictions.
>
> But given these three cases above, it might be that the ideal solution
> is for this framework to be made more generic and for each client to be
> able to obtain and configure a pressure signal to be reflected
> separately in the capacity of each CPU.
>
> My two pennies' worth,
> Ionela.
>
>
>


--
Regards
Thara