Re: [PATCH v5 1/2] drm/panthor: Reset queue slots if termination fails

From: Boris Brezillon
Date: Tue Jun 17 2025 - 10:53:24 EST


On Tue, 3 Jun 2025 10:49:31 +0100
Ashley Smith <ashley.smith@xxxxxxxxxxxxx> wrote:

> This fixes a bug where if we timeout after a suspend and the termination
> fails, due to waiting on a fence that will never be signalled for
> example, we do not resume the group correctly. The fix forces a reset
> for groups that are not terminated correctly.
>
> Signed-off-by: Ashley Smith <ashley.smith@xxxxxxxxxxxxx>
> Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> ---
> drivers/gpu/drm/panthor/panthor_sched.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 43ee57728de5..65d8ae3dcac1 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -2727,8 +2727,17 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
> * automatically terminate all active groups, so let's
> * force the state to halted here.
> */
> - if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED)
> + if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED) {
> csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> +
> + /* Reset the queue slots manually if the termination
> + * request failed.
> + */
> + for (i = 0; i < group->queue_count; i++) {

group is used uninitialized which leads to a random (most likely NULL)
pointer deref. Either we go:

for (i = 0; i < csg_slot->group->queue_count; i++) {

and we move the group variable to the last for loop, so we're not
tempted to use it again in places where it's not initialized, or
we use the group variable consistently accross this function by having

group = csg_slot->group;

assignments where csg_slot->group is currently used.

We might also want to consider splitting this huge function in
sub-functions, but probably not in a patch that's flagged for
backporting.


> + if (group->queues[i])
> + cs_slot_reset_locked(ptdev, csg_id, i);
> + }
> + }
> slot_mask &= ~BIT(csg_id);
> }
> }