Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling

From: Adam Ford
Date: Fri Sep 13 2019 - 10:06:03 EST


On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>
> Hi Adam,
>
> > Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@xxxxxxxxx>:
>
> >>> + cpu_cooling_maps: cooling-maps {
> >>> + map0 {
> >>> + trip = <&cpu_alert0>;
> >>> + /* Only allow OPP50 and OPP100 */
> >>> + cooling-device = <&cpu 0 1>;
> >>
> >> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> >> understand their meaning (and how it relates to the opp list).
> >
> > I read through the documentation, but it wasn't completely clear to
> > me. AFAICT, the numbers after &cpu represent the min and max index in
> > the OPP table when the condition is hit.
>
> Ok. It seems to use "cooling state" for those and the first is minimum
> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
> no limits.
>
> Since here we use the &cpu node it is likely that the "cooling state"
> is the same as the OPP index currently in use.
>
> I have looked through the .dts which use cpu_crit and the picture is
> not unique...
>
> omap4 seems to only define it
> am57xx has two different grade dtsi files
> dra7 overwrites critical temperature value
> am57xx-beagle defines a gpio to control a fan

Checkout rk3288-veyron-mickey.dts

They have almost_warm, warm, almost_hot, hot, hotter, very_hot, and
critical for trips, and they have as many corresponding cooling maps
which appear to limit the CPU speeds, but their index references are
still confusing to me.

For that device,
Warm and no limit first, then 4: coolling-device = <&cpu0 THERMAL_NO_LIMIT 4>
...
very_hot uses a number then no limit: cooling-device = <&cpu0 8
THERMAL_NO_LIMIT>

This makes me wonder if the min and max are switched or the index
values go backwards.

>
> The fan is only a map0 for the board_alert0 starting to do something at 40ÂC
> and above but there is nothing for the "critical" point.
> But throttling is working on omap5...
>
> Could there be some automatic handling by the govenors for the "critical"
> trip points? So that we do not need to define any cooling-maps for "critical"?
>
> Even for the exynos54xx where the crical trip point is only defined.
>



> >>
> >>> + };
> >>> + };
> >>
> >> Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
> >>
> >> Maybe we can add the trip points to omap3-cpu-thermal.dtsi
> >> because they seem to be the same for all omap3 variants and
> >> just have a SoC variant specific cooling map for omap36xx.dtsi.
> >
> > The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
> > operating at 105C. AM3517 is a little different also, so I didn't
> > want to make a generic omap3 throttling table. Since my goal was to
> > try to remove the need for the turbo option from the newly supported
> > 1GHz omap3630/3730, I was hoping to get this pushed first. From
> > there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
> > thermal information.
>
> My observation is that all omap3 have
> commercial range 0ÂC .. 90ÂC
> extended -40ÂC .. 105ÂC
>
> So there is no difference in defining these as trip points and
> my proposal would be to have these in omap3.dtsi.

I can do that. I'll copy-paste the omap4 stuff using 90 and 105 as
limits without any mapping. I'll then keep the mapping in omap3630

>
> Only disabling OPPs differs. Which would make only the mapping
> go to omap36xx.
>
> >
> >>
> >>> +};
> >>> +
> >>> /* OMAP3630 needs dss_96m_fck for VENC */
> >>> &venc {
> >>> clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> >>> --
> >>> 2.17.1
> >>>
> >>
> >> The question is how we can test that. Heating up the omap36xx to 90ÂC
> >> or even 105ÂC isn't that easy like with omap5...
> >>
> >> Maybe we can modify the millicelsius values for testing purposes to
> >> something in reachable range, e.g. 60ÂC and 70ÂC and watch what happens?
> >
> > I have access to a thermal chamber at work, but the guy who knows how
> > to use it is out for the rest of the week. My plan was do as you
> > suggested and change the milicelsius values, but I wanted to get some
> > buy-in from TI people and/or Tony. This also means enabling the omap3
> > thermal stuff which clearly throws a message that it's inaccurate.
>
> Basically we need two different types of test:
> 1. is the DTS setup correct so that the bandgap sensor is read and
> triggers correct actions
> 2. is the bandgap sensor correct enough to that the data sheet limits
> are really met
>
> > I don't know how much it's inaccurate, so we may have to make the 90C
> > value lower to compensate.
>
> Hm. If the bandgap sensor is inaccurate, we should make sure it reports the
> maximum value, just to be on the safe side. So that the real temperature
> is guaranteed to be lower than what is reported.
>
> Then we can use the data sheet limits of 90ÂC and 105ÂC in the trip point
> table (which should not be tweaked for sensor inaccuracy).

I can see not compensating if it reads high, but if the temp reads
low, shouldn't compensate so we don't over temp the processor?

adam
>
> BR,
> Nikolaus
>