Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues

From: Danilo Krummrich
Date: Wed Jun 18 2025 - 10:44:04 EST


On Wed, Jun 18, 2025 at 04:06:41PM +0200, Simona Vetter wrote:
> On Tue, Jun 17, 2025 at 05:08:57PM +0200, Danilo Krummrich wrote:
> > On Tue, Jun 17, 2025 at 04:25:09PM +0200, Simona Vetter wrote:
> > > On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> > > > On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > > > > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > > > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > > > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > > > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > > > > + * least one of them does not execute operations that may block on dma_fences
> > > > > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > > > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > > > > + * can only make progress through this schduler instance), while the
> > > > > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > > > > + * finish. Thus, this can result in a deadlock.
> > > > >
> > > > > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > > > > explain that the wq must provide sufficient forward-progress guarantees
> > > > > for the scheduler, specifically that it's on the dma_fence signalling
> > > > > critical path and leave the concrete examples for people to figure out
> > > > > when the design a specific locking scheme.
> > > >
> > > > This isn't a concrete example, is it? It's exactly what you say in slightly
> > > > different words, with the addition of highlighting the impact of the workqueue's
> > > > max_active configuration.
> > > >
> > > > I think that's relevant, because N - 1 active tasks can be on the dma_fence
> > > > signalling critical path without issues.
> > > >
> > > > We could change
> > > >
> > > > "if max_active tasks are reached at least one of them must not execute
> > > > operations that may block on dma_fences that potentially make progress
> > > > through this scheduler instance"
> > > >
> > > > to
> > > >
> > > > "if max_active tasks are reached at least one of them must not be on the
> > > > dma_fence signalling critical path"
> > > >
> > > > which is a bit more to the point I think.
> > >
> > > My point was to more state that the wq must be suitable for the scheduler
> > > jobs as the general issue, and specifically then also highlight the
> > > dma_fence concurrency issue.
> >
> > Sure, there are more guarantees the driver has to uphold, but this is one of
> > them, so I think we should explain it.
> >
> > > But it's not the only one, you can have driver locks and other fun involved
> > > here too.
> >
> > Yeah, but it boils down to the same issue, e.g. if a driver takes a lock in
> > active work, and this lock is taken elsewhere for activities that violate the
> > dma_fence signalling critical path.
> >
> > All the cases I have in mind boil down to that we potentially, either directly
> > or indirectly (through some synchronization primitive), wait for things we are
> > not allowed to wait for in the dma_fence signalling critical path.
> >
> > Or do you mean something different?
>
> You could also grab a mutex in those driver work items that is held while
> waiting for a dma_fence somewhere. The dma_fence annotations should catch
> that, but at least in my reading of the text here it's not covered.

That's exactly one example of what I mean above with: "All the cases I have in
mind boil down to that we potentially, either directly or indirectly (through
some synchronization primitive), wait for things we are not allowed to wait for
in the dma_fence signalling critical path."

> But my main point is what I explain below, the text fails to clearly
> address the issues that all current drivers (maybe all reasonable drivers,
> but sometimes I lack imagination) can encounter, so to me it's too generic
> and not that directly applicable in practice.

I mean, the ordered workqueue isn't more limited than any other workqueue, it's
just that the likelyhood of hitting problems increases with max_active being
lower or even one.

But the error conditions are the exact same, aren't they? We always have to
ensure that at least one slot of the workqueue is not running a task that does
things that are not allowed to do in the dma_fence signalling critical path.

So, this part is technically covered. Do I understand you correctly that you say
you think it's covered too generically?

Do you propose to list the things we're not allowed to do explicitly?

> And then on the other hand
> dma_fence is definitely the big thing, but fundamtentally you tie anything
> you schedule on those wq to the drm/scheduler design in it's entirety. So
> for the general rule, it's not generic enough for my taste.

Sorry, I can't follow that. Can you please expand a bit on what you think is not
generic enough?

> > > Also since all the paragraphs above talk about ordered wq as the example
> > > where specifying your own wq makes sense, it's a bit confusing to now
> > > suddenly only talk about the concurrent wq case without again mentioned
> > > that the ordered wq case is really limited.
> >
> > I mean, it talks about both cases in a generic way, i.e. if you set
> > max_active == 1 in the text it covers the ordered case.
> >
> > Or do you mean to say that we should *only* allow ordered workqueues to be
> > shared with the driver?
>
> Both examples talk about ordered wq, they don't make any sense with
> max_active > 1, and I can't come up with any example that would. So yeah
> I'm leaning in that direction, at least in the docs. Only discussing
> max_active > 1 for this specific issue is imo very confusing and not
> helping much. I guess we could even WARN_ON if a provided wq is not
> ordered, because that does smell funky for sure.

I think there is nothing wrong with sharing a workqueue with WQ_MAX_ACTIVE
between scheduler instances. In the firmware scheduler case we may have quite a
lot of them. So, why create a new workqueue for each of those, in case we can't
take advantage of the synchronization trick with ordered workqueues?