Re: [PATCH v2] Documentation/gpu: VM_BIND locking document

From: Thomas Hellström
Date: Wed Sep 06 2023 - 07:57:13 EST

Next message: Lad, Prabhakar: "Re: [PATCH] pinctrl: renesas: pinctrl-rzg2l: Add validation of GPIO pin in rzg2l_gpio_request()"
Previous message: Damien Le Moal: "Re: [PATCH v3] ahci: libahci: clear pending interrupt status"
In reply to: Boris Brezillon: "Re: [PATCH v2] Documentation/gpu: VM_BIND locking document"
Next in thread: Boris Brezillon: "Re: [PATCH v2] Documentation/gpu: VM_BIND locking document"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, Boris

On 9/6/23 13:09, Boris Brezillon wrote:

On Wed, 6 Sep 2023 10:32:24 +0200
Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> wrote:

+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gpu_vm's they
+can't share their reservation object with a single gpu_vm, but
will rather
+have a reservation object of their own. The shared objects bound to a
+gpu_vm using one or many
+gpu_vmas are therefore typically put on a per-gpu_vm list which is
+protected by the gpu_vm lock. One could in theory protect it also
with
+the ``gpu_vm->resv``, but since the list of dma_resvs to take is
typically
+built before the ``gpu_vm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gpu_vmas.
+
+At eviction time we now need to invalidate *all* gpu_vmas of a shared
+object, but we can no longer be certain that we hold the gpu_vm's
+dma_resv of all the object's gpu_vmas. We can only be certain that we

I need to think a bit more about locking of extobj and evicted
object tracking
in the case of processing 'drm_gpuva_ops' directly through callbacks
within the
fence signalling critical path as mentioend in [1].

In order to support that, we'd need to protect extobjs with a
separate lock,
and while iterating extobjs to acquire the dma-resv lock drop the
lock within
the loop before we actually acquire the dma-resv lock. Maple tree
supports that
already and this can be fully done within the GPUVA manager; no need
for the
driver to care about that.

So do I understand correctly that this because you want to update the
gpuvm state while operations are progressing asynchronously?

If so, I wonder whether that could really be done? For example to
allocate enough memory for page-tables etc, you need to know the
details of the operations at IOCTL execution time, and to know the
details you need to know the state from the previous operation?

Right, sync and async bind can't run fully concurrently, but you could
"inject" a
sync one between two async ones such that the sync ones executed from
the IOCTL
directly while async execution is stalled meanwhile. This would be
possible because
the actual drm_gpuva_ops would be calculated within the async
execution path rather
than in the IOCTL. But yes, page-table management must be desinged to
support that.

FWIW, the panthor driver is designed this way (note that I'm not
supporting GEM eviction yet, so there might be subtleties I missed).

The problem is that once you've published your VM_BIND out-fence, any code path required to signal that fence may notallocate memory nor or grab any locks that allows allocating memory while held including dma_resv locks, and that means all required page-table memory needs to be allocated synchronously in the IOCTL, and all evicted bos need to be made resident in the IOCTL, and at least in the xe driver the amount of memory we need to allocate depends on the vm state, so we can't really update the vm state asynchronously either.

But as long as any async binding work required for signalling the VM_BIND out-fence is properly annotated with dma_fence_begin_signalling() and dma_fence_end_signalling() and there aren't any lockdep splats, things should be good. It would trigger on both memory allocation and attempts to grab a dma_resv lock.

OK, well one of the main motivations for Xe is to be able to pipeline
interleaving binds and execs if needed, like so:

- Bind vmas for scene 1.
- Submit scene 1.
- Unbind vmas for scene 1.
- Bind vmas for scene 2.
- Submit scene 2.
- Unbind vmas for scene 2.

And being able to *submit* all of the above while the async binding of
vmas for scene (step 1) has not yet completed.
I can't really see how this could be done, while obeying dma-fence
rules, unless state is updated synchronously while submitting?

The idea in this case is to detect when a GPU job dependency is a
VM_BIND out-fence, turn drm_sched_fence->parent into an
xxx_vm_bind_job_fence object that's holding the GEM that's about to be
mapped (AFAICT, we don't need to do anything for unmap operations), and
then add our GPU job fence to this BO. This should not only guarantee
that the GEMs we depend on are mapped before the GPU job is executed
(the fence wait does that), but also that such yet-to-be-mapped GEMs
won't be evicted just after they've been mapped and before the GPU had
a chance to execute (unless I'm missing something, adding our GPU job
fence to the BO being targeted by a pending VM_BIND(async,map) operation
solves this problem).

Yes, we're essentially doing the same. The issue here is that when we, for example *submit* Bind vmas for scene 2,
we need to know how much page-table memory to allocate, and what BOs to make resident to be able to publish the out-fence. That means we need to know what the VM state would look like at the end of "Unbind vmas for scene 1". If the VM state is updated at submission time, that's all ok but if it's updated at execution time, we'd have to guess what resources to pre-allocate.

/Thomas

Next message: Lad, Prabhakar: "Re: [PATCH] pinctrl: renesas: pinctrl-rzg2l: Add validation of GPIO pin in rzg2l_gpio_request()"
Previous message: Damien Le Moal: "Re: [PATCH v3] ahci: libahci: clear pending interrupt status"
In reply to: Boris Brezillon: "Re: [PATCH v2] Documentation/gpu: VM_BIND locking document"
Next in thread: Boris Brezillon: "Re: [PATCH v2] Documentation/gpu: VM_BIND locking document"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]