Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling

From: Adam Ford
Date: Fri Sep 13 2019 - 09:29:13 EST


On Fri, Sep 13, 2019 at 6:07 AM Adam Ford <aford173@xxxxxxxxx> wrote:
>
> On Fri, Sep 13, 2019 at 1:56 AM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> >
> > Hi Adam,
> >
> > > Am 12.09.2019 um 20:30 schrieb Adam Ford <aford173@xxxxxxxxx>:
> > >
> > > The thermal sensor in the omap3 family isn't accurate, but it's
> > > better than nothing. The various OPP's enabled for the omap3630
> > > support up to OPP1G, however the datasheet for the DM3730 states
> > > that OPP130 and OPP1G are not available above TJ of 90C.
> >
> > We may have to add similar things for omap34xx as well. See
> > data sheet 3.3 Recommended Operating Conditions
> >
> > But when reading them they do not limit temperature but
> > number of operation hours of each OPP depending on temperature...
> > That is clearly beyond what a kernel can do (we would have to
> > have access to some NVRAM in the kernel counting hours).
> >
> > >
> > > This patch configures the thermal throttling to limit the
> > > operating points of the omap3630 to Only OPP50 and OPP100 if
> >
> > s/Only/only/
>
> I will fix when I do V2
> >
> > > the thermal sensor reads a value above 90C.
> > >
> > > Signed-off-by: Adam Ford <aford173@xxxxxxxxx>
> > >
> > > diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> > > index 4bb4f534afe2..58b9d347019f 100644
> > > --- a/arch/arm/boot/dts/omap36xx.dtsi
> > > +++ b/arch/arm/boot/dts/omap36xx.dtsi
> > > @@ -25,6 +25,7 @@
> > >
> > > vbb-supply = <&abb_mpu_iva>;
> > > clock-latency = <300000>; /* From omap-cpufreq driver */
> > > + #cooling-cells = <2>;
> > > };
> > > };
> > >
> > > @@ -195,6 +196,31 @@
> > > };
> > > };
> > >
> > > +&cpu_thermal {
> > > + cpu_trips: trips {
> >
> > Yes, that is comparable to what I have seen in omap5 DT where I know
> > that thermal throttling works.
> >
> > > + /* OPP130 and OPP1G are not available above TJ of 90C. */
> > > + cpu_alert0: cpu_alert {
> > > + temperature = <90000>; /* millicelsius */
> > > + hysteresis = <2000>; /* millicelsius */
> > > + type = "passive";
> > > + };
> > > +
> > > + cpu_crit: cpu_crit {
> > > + temperature = <125000>; /* millicelsius */
> >
> > Shouldn't this be 105ÂC for all omap3 chips (industrial temperature range)?
>
> You are correct. I forgot to change this when I did my copy-paste.
> >
> > > + hysteresis = <2000>; /* millicelsius */
> > > + type = "critical";
> > > + };
> > > + };
> > > +
> > > + cpu_cooling_maps: cooling-maps {
> > > + map0 {
> > > + trip = <&cpu_alert0>;
> > > + /* Only allow OPP50 and OPP100 */
> > > + cooling-device = <&cpu 0 1>;
> >
> > omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> > understand their meaning (and how it relates to the opp list).
>
> I read through the documentation, but it wasn't completely clear to
> me. AFAICT, the numbers after &cpu represent the min and max index in
> the OPP table when the condition is hit.
> >
> > > + };
> > > + };
> >
> > Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
> >
> > Maybe we can add the trip points to omap3-cpu-thermal.dtsi
> > because they seem to be the same for all omap3 variants and
> > just have a SoC variant specific cooling map for omap36xx.dtsi.
>
> The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
> operating at 105C. AM3517 is a little different also, so I didn't
> want to make a generic omap3 throttling table. Since my goal was to
> try to remove the need for the turbo option from the newly supported
> 1GHz omap3630/3730, I was hoping to get this pushed first. From
> there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
> thermal information.
>
> >
> > > +};
> > > +
> > > /* OMAP3630 needs dss_96m_fck for VENC */
> > > &venc {
> > > clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> > > --
> > > 2.17.1
> > >
> >
> > The question is how we can test that. Heating up the omap36xx to 90ÂC
> > or even 105ÂC isn't that easy like with omap5...
> >
> > Maybe we can modify the millicelsius values for testing purposes to
> > something in reachable range, e.g. 60ÂC and 70ÂC and watch what happens?
>
> I have access to a thermal chamber at work, but the guy who knows how
> to use it is out for the rest of the week. My plan was do as you
> suggested and change the milicelsius values, but I wanted to get some
> buy-in from TI people and/or Tony. This also means enabling the omap3
> thermal stuff which clearly throws a message that it's inaccurate. I
> don't know how much it's inaccurate, so we may have to make the 90C
> value lower to compensate.

I set the alert to 60C then booted the system. It initially read
58.6C, then I ran a benchmark to push the processor over 60C.
Unfortunately, it didn't appear to throttle like I expected. I was
expecting it to only make 300 amd 600 MHz available.

cat /sys/devices/virtual/thermal/thermal_zone0/temp
58500

whetstone 200000
Loops: 200000, Iterations: 1, Duration: 31 sec.
C Converted Double Precision Whetstones: 645.2 MIPS

cat /sys/devices/virtual/thermal/thermal_zone0/temp
62000

cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
300000 600000 800000

I am going to investigate how other processors do this. I may have
the cpu reference wrong.

adam
>
> adam
> >
> > BR,
> > Nikolaus
> >
> >
> >