Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

From: Andrey Grodzovsky
Date: Tue Apr 12 2022 - 15:43:06 EST



On 2022-04-12 14:20, Dmitry Osipenko wrote:
On 4/12/22 19:51, Andrey Grodzovsky wrote:
On 2022-04-11 18:15, Dmitry Osipenko wrote:
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.

I am not sure this is the beast Idea to leave job's sw fence signalling
to be
executed in system_wq context which is prone to delays of executing
various work items from around the system. Seems better to me to leave the
fence signaling within the IRQ context and offload only the job freeing or,
maybe handle rescheduling to thread context within drivers implemention
of .free_job cb. Not really sure which is the better.
We're talking here about killing jobs when driver destroys context,
which doesn't feel like it needs to be a fast path. I could move the
signalling into drm_sched_entity_kill_jobs_cb() and use unbound wq, but
do we really need this for a slow path?


You can't move the signaling back to drm_sched_entity_kill_jobs_cb
since this will bring back the lockdep splat that 'drm/sched: Avoid lockdep spalt on killing a processes'
was fixing.

I see your point and i guess we can go this way too. Another way would be to add to
panfrost and msm job a  work_item and reschedule to thread context from within their
.free_job callbacks but that probably to cumbersome to be justified here.

Andrey


Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>