Re: [RFC PATCH 0/6] drm/sched: Avoid memory leaks by canceling job-by-job

From: Tvrtko Ursulin
Date: Tue Jun 03 2025 - 08:28:15 EST



On 03/06/2025 10:31, Philipp Stanner wrote:
An alternative version to [1], based on Tvrtko's suggestion from [2].

I tested this for Nouveau. Works.

I'm having, however, bigger problems properly porting the unit tests and
have seen various explosions. In the process I noticed that some things
in the unit tests aren't right and a bit of a larger rework will be
necessary (for example, the timedout job callback must signal the
timedout fence, remove it from the list and so on).

General approach I follow when implementing any mock component is to implement only as much is needed for a test to pass. Only add more and rework when a test/functionality is added which requires it.

Specifically for timedout callback signaling I see that I had exactly that added in the patch you linked as [2].
> Anyways. Please comment on the general idea.

I am obviously okay with it. :) Especially now that you verified it works well for nouveau.

What I am not that ecstatic about is only getting the Suggested-by credit in 1/6. Given it is basically my patch with some cosmetic changes like the kernel doc and the cancel loop extracted to a helper.

@Tvrtko: As briefly brainstormed about on IRC, if you'd be willing to
take care of the unit tests patch, I could remove that one (and,
maaaaybe, the warning print patch) from the series and we could merge
this RFC's successor version %N once it's ready. What do you think?

Okay in principle but the first thing I would suggest you could try is to take my unit tests adaptations from [2] verbatim. Benefit of keeping everything in one series is more confidence we are merging a solid thing. But I can take it on myself as a follow up too if you want.

Regards,

Tvrtko


P.

[1] https://lore.kernel.org/dri-devel/20250522082742.148191-2-phasta@xxxxxxxxxx/
[2] https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursulin@xxxxxxxxxx/

Philipp Stanner (6):
drm/sched: Avoid memory leaks with cancel_job() callback
drm/sched/tests: Implement cancel_job()
drm/sched: Warn if pending list is not empty
drm/nouveau: Make fence container helper usable driver-wide
drm/nouveau: Add new callback for scheduler teardown
drm/nouveau: Remove waitque for sched teardown

drivers/gpu/drm/nouveau/nouveau_fence.c | 35 +++++----
drivers/gpu/drm/nouveau/nouveau_fence.h | 7 ++
drivers/gpu/drm/nouveau/nouveau_sched.c | 35 +++++----
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 +--
drivers/gpu/drm/scheduler/sched_main.c | 37 ++++++----
.../gpu/drm/scheduler/tests/mock_scheduler.c | 71 +++++++------------
drivers/gpu/drm/scheduler/tests/sched_tests.h | 4 +-
include/drm/gpu_scheduler.h | 9 +++
9 files changed, 115 insertions(+), 100 deletions(-)