Re: [PATCH v2] drm/msm: Check for powered down HW in the devfreq callbacks

From: Eric Anholt
Date: Fri May 01 2020 - 15:26:56 EST


On Fri, May 1, 2020 at 12:03 PM Jordan Crouse <jcrouse@xxxxxxxxxxxxxx> wrote:
>
> Writing to the devfreq sysfs nodes while the GPU is powered down can
> result in a system crash (on a5xx) or a nasty GMU error (on a6xx):
>
> $ /sys/class/devfreq/5000000.gpu# echo 500000000 > min_freq
> [ 104.841625] platform 506a000.gmu: [drm:a6xx_gmu_set_oob]
> *ERROR* Timeout waiting for GMU OOB set GPU_DCVS: 0x0
>
> Despite the fact that we carefully try to suspend the devfreq device when
> the hardware is powered down there are lots of holes in the governors that
> don't check for the suspend state and blindly call into the devfreq
> callbacks that end up triggering hardware reads in the GPU driver.
>
> Call pm_runtime_get_if_in_use() in the gpu_busy() and gpu_set_freq()
> callbacks to skip the hardware access if it isn't active.
>
> v2: Use pm_runtime_get_if_in_use() per Eric Anholt
>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Jordan Crouse <jcrouse@xxxxxxxxxxxxxx>
> ---
>
> drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 6 ++++++
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 8 ++++++++
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 7 +++++++
> 3 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 724024a2243a..4d7f269edfcc 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1404,6 +1404,10 @@ static unsigned long a5xx_gpu_busy(struct msm_gpu *gpu)
> {
> u64 busy_cycles, busy_time;
>
> + /* Only read the gpu busy if the hardware is already active */
> + if (pm_runtime_get_if_in_use(&gpu->pdev->dev) <= 0)
> + return 0;
> +

RPM's APIs are a bit of a trap and will return a negative errno for
the get functions if runtime PM is disabled in kconfig, even though
usually that would mean that the power domain is not ever disabled by
RPM. I think in these checks you want "if (pm_runtime_get_if_in_use()
== 0)", and that seems to be a common pattern in other drivers. With
that,

Reviewed-by: Eric Anholt <eric@xxxxxxxxxx>

(and tested, too)