Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

From: Rajendra Nayak
Date: Wed Sep 02 2020 - 01:41:39 EST



I'm not massively familiar with this area of the code, but I guess I
shouldn't let that stop me from having an opinion! :-P

* I would agree that it seems highly unlikely that someone would put
one of these chips in a device that could only dissipate the heat from
the lowest OPP, so having some higher estimate definitely makes sense.

* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.

No, I'm claiming it's 768 mW + 1202 mW = ~2 W.

SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the
2 big cores. Each of these thermal zones uses either all little or all big
cores as cooling devices, hence the power sustainable power of the
individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all
little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz).

My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop. However, I think someone could conceivably put this chip in a
smaller form factor. In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).

See above, the sustainable power with this patch only adds up to ~2000.
It is possible though that it would be lower in a smaller form factor
device.

I'd be ok with posting something lower for SC7180 (it would be a guess
though) and use the specific numbers in the device specific DT.

It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient". I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything. Rajendra maybe knows?

Yeah, it's possible that that was just an approximation

No, these are based on actual power measurements.


* I'm curious about the fact that there are two numbers here: one for
littles and one for bigs. If I had to guess I'd say that since all
the cores are in one package so the contributions kinda need to be
thought of together, right? If we're sitting there thermally
throttled then we'd want to pick the best perf-per-watt for the
overall package. This is why your patch says we can sustain the
little cores at max and the big cores get whatever is left over,
right?

It's derived from how Qualcomm specified the thermal zones and cooling
devices. Any ("cpu") zone is either cooled by (all) big cores or by (all)
little cores, but not a mix of them. In my tests I also saw that the big
cores seemed to have little impact on the little ones. The little cores
are at max because even running at max frequency the temperature in the
'little zones' wouldn't come close to the trip point.

* Should we be leaving some room in here for the GPU? ...or I guess
once we list it as a cooling device we'll have to decrease the amount
the CPUs can use?

I don't know for sure, but judging from the CPU zones I wouldn't be
surprised if the GPU was managed exclusively in the dedicated GPU
thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal'
are). If that's not the case the values in the CPU zones can be
adjusted when specific data is available.

So I guess the tl; dr is:

a) We should check "dynamic-power-coefficient" and possibly adjust.

ok, lets see if Rajendra can check if there is room for tweaking.

I suggest we don't :)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation