Re: New subsystem for acceleration devices

From: Oded Gabbay
Date: Thu Aug 04 2022 - 03:44:17 EST


On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@xxxxxxxxx> wrote:
>
> On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:
> >
> > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@xxxxxxxxx> wrote:
> > >
> > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > > Greg and I talked a couple of months ago about preparing a new accel
> > > > subsystem for compute/acceleration devices that are not GPUs and I
> > > > think your drivers that you are now trying to upstream fit it as well.
> > >
> > > We've had some submissions for not-GPUs to the drm subsystem recently.
> > >
> > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit.
> > >
> > > why is creating a new subsystem at this time necessary?
> > >
> > > Are we just creating a subsystem to avoid the open source userspace
> > > consumer rules? Or do we have some concrete reasoning behind it?
> > >
> > > Dave.
> >
> > Hi Dave.
> > The reason it happened now is because I saw two drivers, which are
> > doing h/w acceleration for AI, trying to be accepted to the misc
> > subsystem.
> > Add to that the fact I talked with Greg a couple of months ago about
> > doing a subsystem for any compute accelerators, which he was positive
> > about, I thought it is a good opportunity to finally do it.
> >
> > I also honestly think that I can contribute much to these drivers from
> > my experience with the habana driver (which is now deployed in mass at
> > AWS) and contribute code from the habana driver to a common framework
> > for AI drivers.
>
> Why not port the habana driver to drm now instead? I don't get why it
> wouldn't make sense?

imho, no, I don't see the upside. This is not a trivial change, and
will require a large effort. What will it give me that I need and I
don't have now ?

>
> Stepping up to create a new subsystem is great, but we need rules
> around what belongs where, we can't just spawn new subsystems when we
> have no clear guidelines on where drivers should land.
>
> What are the rules for a new accel subsystem? Do we have to now
> retarget the 3 drivers that are queued up to use drm for accelerators,
> because 2 drivers don't?
>
> There's a lot more to figure out here than merge some structures and
> it will be fine.
I totally agree. We need to set some rules and make sure everyone in
the kernel community is familiar with them, because now you get
different answers based on who you consult with.

My rules of thumb that I thought of was that if you don't have any
display (you don't need to support X/wayland) and you don't need to
support opengl/vulkan/opencl/directx or any other gpu-related software
stack, then you don't have to go through drm.
In other words, if you don't have gpu-specific h/w and/or you don't
need gpu uAPI, you don't belong in drm.

After all, memory management services, or common device chars handling
I can get from other subsystems (e.g. rdma) as well. I'm sure I could
model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI
there as well), but this doesn't mean I belong there, right ?

>
> I think the one area I can see a divide where a new subsystem is for
> accelerators that are single-user, one shot type things like media
> drivers (though maybe they could be just media drivers).
>
> I think anything that does command offloading to firmware or queues
> belongs in drm, because that is pretty much what the framework does. I
I think this is a very broad statement which doesn't reflect reality
in the kernel.

> think it might make sense to enhance some parts of drm to fit things
> in better, but that shouldn't block things getting started.
>
> I'm considering if, we should add an accelerator staging area to drm
> and land the 2-3 submissions we have and try and steer things towards
> commonality that way instead of holding them out of tree.
Sounds like a good idea regardless of this discussion.

>
> I'd like to offload things from Greg by just not having people submit
> misc drivers at all for things that should go elsewhere.
Great, at least we can agree on that.

Thanks,
Oded

>
> Dave.
>
>
> >
> > Regarding the open source userspace rules in drm - yes, I think your
> > rules are too limiting for the relatively young AI scene, and I saw at
> > the 2021 kernel summit that other people from the kernel community
> > think that as well.
> > But that's not the main reason, or even a reason at all for doing
> > this. After all, at least for habana, we open-sourced our compiler and
> > a runtime library. And Greg also asked those two drivers if they have
> > matching open-sourced user-space code.
> >
> > And a final reason is that I thought this can also help in somewhat
> > reducing the workload on Greg. I saw in the last kernel summit there
> > was a concern about bringing more people to be kernel maintainers so I
> > thought this is a step in the right direction.
> >
> > Oded