Re: [PATCH v4 1/6] arm64: dts: rockchip: add thermal zones information on RK3588

From: Dragan Simic
Date: Mon May 06 2024 - 08:04:39 EST


On 2024-05-06 12:29, Alexey Charkov wrote:
On Mon, May 6, 2024 at 1:52 PM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
Thanks for submitting the v4 of this series! Please, see a couple
of my comments below.

On 2024-05-06 11:36, Alexey Charkov wrote:
> This includes the necessary device tree data to allow thermal
> monitoring on RK3588(s) using the on-chip TSADC device, along with
> trip points for automatic thermal management.
>
> Each of the CPU clusters (one for the little cores and two for
> the big cores) get a passive cooling trip point at 85C, which
> will trigger DVFS throttling of the respective cluster upon
> reaching a high temperature condition.
>
> All zones also have a critical trip point at 115C, which will
> trigger a reset.
>
> Signed-off-by: Alexey Charkov <alchark@xxxxxxxxx>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 147
> ++++++++++++++++++++++++++++++
> 1 file changed, 147 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 6ac5ac8b48ab..ef06c1f742e8 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> #include <dt-bindings/phy/phy.h>
> #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> compatible = "rockchip,rk3588";
> @@ -2368,6 +2369,152 @@ pwm15: pwm@febf0030 {
> status = "disabled";
> };
>
> + thermal_zones: thermal-zones {
> + /* sensor near the center of the SoC */
> + package_thermal: package-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 0>;
> +
> + trips {
> + package_crit: package-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 0 and 1 */
> + bigcore0_thermal: bigcore0-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 1>;
> +
> + trips {
> + bigcore0_alert: bigcore0-alert {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };

Doesn't removing the second passive trip, which was present in the v3,
result in confusing the IPA governor?

Not really - it will just treat the missing trip as 0C for its initial
PID calculations [1], and will continually run the governor as opposed
to putting it to rest when the temperature is below the "switch on"
value [2].

Getting the power allocation governor to work optimally (i.e. to
provide tangible benefits over, say, stepwise) is much more involved
than defining an arbitrary switch-on trip point, as it requires an
accurate estimate of sustainable power per thermal zone (which we
don't have for RK3588 in general, and furthermore it must depend a lot
on a particular cooling setup), and ideally some userspace
power/thermal model capable of tuning the PID coefficients and
updating them via sysfs based on how a particular system accumulates
and dissipates heat under different load.

So after thinking over it for a while I decided that those extra
passive trips were rather self-deceiving, as they are only useful in
the context of a power allocation governor but we do not have any of
the other pieces in place for the power allocation governor to work.
Better not to clutter the device tree IMO.

I see, thanks for the clarification. Please, give me some time
to thoroughly test your patches, which I'll hopefully be able to
do in the next few days.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/thermal/gov_power_allocator.c#n156
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/thermal/gov_power_allocator.c#n487