Re: [PATCH] drm/sched/tests: Make timedout_job callback a better role model

From: Tvrtko Ursulin
Date: Mon Jun 16 2025 - 06:58:16 EST



On 05/06/2025 14:41, Philipp Stanner wrote:
Since the drm_mock_scheduler does not have real users in userspace, nor
does it have real hardware or firmware rings, it's not necessary to
signal timedout fences nor free jobs - from a functional standpoint.

The unit tests, however, serve as a reference implementation and a first
example for new scheduler users. Therefore, they should approximate the
canonical usage as much as possible.

Make sure timed out hardware fences get signaled with the appropriate
error code.

Code looks fine, but currently nothing is broken and I disagree with the goal that the _mock_^1 components should be role models. The idea is to implement as little in the mock components as it is required to exercise the tested functionality.

As the motivation for this patch, ie. things which start depending on adding this functionality, only appear in one upcoming series, and as mentioned in https://lore.kernel.org/dri-devel/18cd6b1f-8872-4a16-9ceb-50fd1ecfea39@xxxxxxxxxx/, I think it makes most sense to stick with that philosophy and fold this patch into the respective series.

Also, there are various ways drivers use the scheduler API. Trying to make the mock scheduler a reference driver implementation would only be able to make it a reference for one possible use.

Regards,

Tvrtko

1)
mock
adjective
not authentic or real, but without the intention to deceive.
"a mock-Georgian red brick house"


Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 26 ++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index 7f947ab9d322..49d067fecd67 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -200,12 +200,36 @@ static struct dma_fence *mock_sched_run_job(struct drm_sched_job *sched_job)
return &job->hw_fence;
}
+/*
+ * Normally, drivers would take appropriate measures in this callback, such as
+ * killing the entity the faulty job is associated with, resetting the hardware
+ * and / or resubmitting non-faulty jobs.
+ *
+ * For the mock scheduler, there are no hardware rings to be resetted nor jobs
+ * to be resubmitted. Thus, this function merely ensures that
+ * a) timedout fences get signaled properly and removed from the pending list
+ * b) the mock scheduler framework gets informed about the timeout via a flag
+ * c) The drm_sched_job, not longer needed, gets freed
+ */
static enum drm_gpu_sched_stat
mock_sched_timedout_job(struct drm_sched_job *sched_job)
{
+ struct drm_mock_scheduler *sched = drm_sched_to_mock_sched(sched_job->sched);
struct drm_mock_sched_job *job = drm_sched_job_to_mock_job(sched_job);
+ unsigned long flags;
- job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
+ spin_lock_irqsave(&sched->lock, flags);
+ if (!dma_fence_is_signaled_locked(&job->hw_fence)) {
+ list_del(&job->link);
+ job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
+ dma_fence_set_error(&job->hw_fence, -ETIMEDOUT);
+ dma_fence_signal_locked(&job->hw_fence);
+ }
+ spin_unlock_irqrestore(&sched->lock, flags);
+
+ dma_fence_put(&job->hw_fence);
+ drm_sched_job_cleanup(sched_job);
+ /* Mock job itself is freed by the kunit framework. */
return DRM_GPU_SCHED_STAT_NOMINAL;
}