Re: [Intel-gfx] [PATCH 3/3] misc/habalabs: don't set default fence_ops->wait

From: Oded Gabbay
Date: Wed May 20 2020 - 14:10:16 EST


On Wed, May 20, 2020 at 9:05 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Thu, May 14, 2020 at 02:38:38PM +0300, Oded Gabbay wrote:
> > On Tue, May 12, 2020 at 9:12 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > >
> > > On Tue, May 12, 2020 at 4:14 AM Dave Airlie <airlied@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, 11 May 2020 at 19:37, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, May 11, 2020 at 12:11 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > > > >
> > > > > > It's the default.
> > > > > Thanks for catching that.
> > > > >
> > > > > >
> > > > > > Also so much for "we're not going to tell the graphics people how to
> > > > > > review their code", dma_fence is a pretty core piece of gpu driver
> > > > > > infrastructure. And it's very much uapi relevant, including piles of
> > > > > > corresponding userspace protocols and libraries for how to pass these
> > > > > > around.
> > > > > >
> > > > > > Would be great if habanalabs would not use this (from a quick look
> > > > > > it's not needed at all), since open source the userspace and playing
> > > > > > by the usual rules isn't on the table. If that's not possible (because
> > > > > > it's actually using the uapi part of dma_fence to interact with gpu
> > > > > > drivers) then we have exactly what everyone promised we'd want to
> > > > > > avoid.
> > > > >
> > > > > We don't use the uapi parts, we currently only using the fencing and
> > > > > signaling ability of this module inside our kernel code. But maybe I
> > > > > didn't understand what you request. You want us *not* to use this
> > > > > well-written piece of kernel code because it is only used by graphics
> > > > > drivers ?
> > > > > I'm sorry but I don't get this argument, if this is indeed what you meant.
> > > >
> > > > We would rather drivers using a feature that has requirements on
> > > > correct userspace implementations of the feature have a userspace that
> > > > is open source and auditable.
> > > >
> > > > Fencing is tricky, cross-device fencing is really tricky, and having
> > > > the ability for a closed userspace component to mess up other people's
> > > > drivers, think i915 shared with closed habana userspace and shared
> > > > fences, decreases ability to debug things.
> > > >
> > > > Ideally we wouldn't offer users known untested/broken scenarios, so
> > > > yes we'd prefer that drivers that intend to expose a userspace fencing
> > > > api around dma-fence would adhere to the rules of the gpu drivers.
> > > >
> > > > I'm not say you have to drop using dma-fence, but if you move towards
> > > > cross-device stuff I believe other drivers would be correct in
> > > > refusing to interact with fences from here.
> > >
> > > The flip side is if you only used dma-fence.c "because it's there",
> > > and not because it comes with an uapi attached and a cross-driver
> > > kernel internal contract for how to interact with gpu drivers, then
> > > there's really not much point in using it. It's a custom-rolled
> > > wait_queue/event thing, that's all. Without the gpu uapi and gpu
> > > cross-driver contract it would be much cleaner to just use wait_queue
> > > directly, and that's a construct all kernel developers understand, not
> > > just gpu folks. From a quick look at least habanalabs doesn't use any
> > > of these uapi/cross-driver/gpu bits.
> > > -Daniel
> >
> > Hi Daniel,
> > I want to say explicitly that we don't use the dma-buf uapi parts, nor
> > we intend to use them to communicate with any GPU device. We only use
> > it as simple completion mechanism as it was convenient to use.
> > I do understand I can exchange that mechanism with a simpler one, and
> > I will add an internal task to do it (albeit not in a very high
> > priority) and upstream it, its just that it is part of our data path
> > so we need to thoroughly validate it first.
>
> Sounds good.
>
> Wrt merging this patch here, can you include that in one of your next
> pulls? Or should I toss it entirely, waiting for you to remove dma_fence
> outright?

I'll include it in the next pull.
Thanks,
Oded
>
> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch