Re: [Patch v9 0/8] Introduce Thermal Pressure

From: Dietmar Eggemann
Date: Mon Feb 10 2020 - 07:09:09 EST


On 28/01/2020 23:35, Thara Gopinath wrote:
> Thermal governors can respond to an overheat event of a cpu by
> capping the cpu's maximum possible frequency. This in turn
> means that the maximum available compute capacity of the
> cpu is restricted. But today in the kernel, task scheduler is
> not notified of capping of maximum frequency of a cpu.
> In other words, scheduler is unaware of maximum capacity
> restrictions placed on a cpu due to thermal activity.
> This patch series attempts to address this issue.
> The benefits identified are better task placement among available
> cpus in event of overheating which in turn leads to better
> performance numbers.
>
> The reduction in the maximum possible capacity of a cpu due to a
> thermal event can be considered as thermal pressure. Instantaneous
> thermal pressure is hard to record and can sometime be erroneous
> as there can be mismatch between the actual capping of capacity
> and scheduler recording it. Thus solution is to have a weighted
> average per cpu value for thermal pressure over time.
> The weight reflects the amount of time the cpu has spent at a
> capped maximum frequency. Since thermal pressure is recorded as
> an average, it must be decayed periodically. Exisiting algorithm
> in the kernel scheduler pelt framework is re-used to calculate
> the weighted average. This patch series also defines a sysctl
> inerface to allow for a configurable decay period.
>
> Regarding testing, basic build, boot and sanity testing have been
> performed on db845c platform with debian file system.
> Further, dhrystone and hackbench tests have been
> run with the thermal pressure algorithm. During testing, due to
> constraints of step wise governor in dealing with big little systems,
> trip point 0 temperature was made assymetric between cpus in little
> cluster and big cluster; the idea being that
> big core will heat up and cpu cooling device will throttle the
> frequency of the big cores faster, there by limiting the maximum available
> capacity and the scheduler will spread out tasks to little cores as well.
>
> Test Results
>
> Hackbench: 1 group , 30000 loops, 10 runs
> Result SD
> (Secs) (% of mean)
> No Thermal Pressure 14.03 2.69%
> Thermal Pressure PELT Algo. Decay : 32 ms 13.29 0.56%
> Thermal Pressure PELT Algo. Decay : 64 ms 12.57 1.56%
> Thermal Pressure PELT Algo. Decay : 128 ms 12.71 1.04%
> Thermal Pressure PELT Algo. Decay : 256 ms 12.29 1.42%
> Thermal Pressure PELT Algo. Decay : 512 ms 12.42 1.15%
>
> Dhrystone Run Time : 20 threads, 3000 MLOOPS
> Result SD
> (Secs) (% of mean)
> No Thermal Pressure 9.452 4.49%
> Thermal Pressure PELT Algo. Decay : 32 ms 8.793 5.30%
> Thermal Pressure PELT Algo. Decay : 64 ms 8.981 5.29%
> Thermal Pressure PELT Algo. Decay : 128 ms 8.647 6.62%
> Thermal Pressure PELT Algo. Decay : 256 ms 8.774 6.45%
> Thermal Pressure PELT Algo. Decay : 512 ms 8.603 5.41%

What do we do on systems on which one Frequency domain spawns all the
CPUs (e.g. Hikey620)?

perf stat --null --repeat 10 -- perf bench sched messaging -g 10 -l 1000

# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 4.697 [sec]
# Running 'sched/messaging' benchmark:
[ 8082.882751] hisi_thermal f7030700.tsensor: sensor <2> THERMAL ALARM: 66385 > 65000
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 4.910 [sec]
# Running 'sched/messaging' benchmark:
[ 8091.070386] CPU3 cpus=0-7 th_pressure=205
[ 8091.178390] CPU3 cpus=0-7 th_pressure=0
[ 8091.286389] CPU3 cpus=0-7 th_pressure=205
[ 8091.398397] CPU3 cpus=0-7 th_pressure=0