[PATCH v2 0/7] Improve GPU Recovery

From: Akhil P Oommen
Date: Sat Jul 09 2022 - 02:00:47 EST



Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.

First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.

Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.

Changes in v2:
- Rebased on msm-next tip

Akhil P Oommen (7):
drm/msm: Remove unnecessary pm_runtime_get/put
drm/msm: Correct pm_runtime votes in recover worker
drm/msm: Fix cx collapse issue during recovery
drm/msm: Ensure cx gdsc collapse during recovery
arm64: dts: qcom: sc7280: Update gpu register list
drm/msm/a6xx: Improve gpu recovery sequence
drm/msm/a6xx: Handle GMU prepare-slumber hfi failure

arch/arm64/boot/dts/qcom/sc7280.dtsi | 6 ++-
drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 ++
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 ++++++++++++++++++++++-------------
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 36 +++++++++++++--
drivers/gpu/drm/msm/msm_gpu.c | 9 ++--
drivers/gpu/drm/msm/msm_gpu.h | 1 +
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 --
7 files changed, 100 insertions(+), 43 deletions(-)

--
2.7.4