Re: drm_sched with panfrost crash on T820

From: Steven Price
Date: Fri Sep 27 2019 - 06:48:34 EST


On 27/09/2019 10:55, Steven Price wrote:
[...]
> One obvious issue with the DRM scheduler is that there is a call to
> cancel_delayed_work() in drm_sched_stop() which to me looks like it
> should be cancel_delayed_work_sync() to ensure that the timeout handling
> has completed.
>
> However in the above scenario a _sync() variety would then cause a
> deadlock (one thread has pfdev->reset_lock and is waiting for the other
> thread which is trying to take the lock).
>
> So we need to update Panfrost so that it can coordinate the reset
> between schedulers. Can you try something like the following (untested):

And actually testing it I of course discover it doesn't quite work. We
do need the cancel_delayed_work_sync() in the DRM scheduler (when
stopping a different scheduler) and we need to avoid holding the
reset_lock during the drm_sched_stop() call to prevent deadlocking with
another thread handling a timeout.

Can you give the following patch a spin? I don't have a great
reproduction case, so it would be good to get some confidence it fixes
the problem.

----8<----