Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver

From: Jeffrey Hugo
Date: Tue May 19 2020 - 10:57:53 EST


On 5/18/2020 11:08 PM, Dave Airlie wrote:
On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:

Introduction:
Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
SoC ASIC for the purpose of efficently running Deep Learning inference
workloads in a data center environment.

The offical press release can be found at -
https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference

The offical product website is -
https://www.qualcomm.com/products/datacenter-artificial-intelligence

At the time of the offical press release, numerious technology news sites
also covered the product. Doing a search of your favorite site is likely
to find their coverage of it.

It is our goal to have the kernel driver for the product fully upstream.
The purpose of this RFC is to start that process. We are still doing
development (see below), and thus not quite looking to gain acceptance quite
yet, but now that we have a working driver we beleive we are at the stage
where meaningful conversation with the community can occur.


Hi Jeffery,

Just wondering what the userspace/testing plans for this driver.

This introduces a new user facing API for a device without pointers to
users or tests for that API.

We have daily internal testing, although I don't expect you to take my word for that.

I would like to get one of these devices into the hands of Linaro, so that it can be put into KernelCI. Similar to other Qualcomm products. I'm trying to convince the powers that be to make this happen.

Regarding what the community could do on its own, everything but the Linux driver is considered proprietary - that includes the on device firmware and the entire userspace stack. This is a decision above my pay grade.

I've asked for authorization to develop and publish a simple userspace application that might enable the community to do such testing, but obtaining that authorization has been slow.

Although this isn't a graphics driver, and Greg will likely merge
anything to the kernel you throw at him, I do wonder how to validate
the uapi from a security perspective. It's always interesting when
someone wraps a DMA engine with user ioctls, and without enough
information to decide if the DMA engine is secure against userspace
misprogramming it.

I'm curious, what information might you be looking for? Are you concerned about the device attacking the host, or the host attacking the device?

Also if we don't understand the programming API on board the device,
we can't tell if the "core" on the device are able to reprogram the
device engines either.

So, you are looking for details about the messaging protocol which are considered opaque to the kernel driver? Or something else?

Figuring this out is difficult at the best of times, it helps if there
is access to the complete device documentation or user space side
drivers in order to faciliate this.

Regarding access to documentation, sadly that isn't going to happen now, or in the near future. Again, above my pay grade. The only public "documentation" is what you can see from my emails.

I understand your position, and if I can "bound" the information you are looking for, I can see what I can do about getting you what you want. No promises, but I will try.

The other area I mention is testing the uAPI, how do you envisage
regression testing and long term sustainability of the uAPI?

Can you clarify what you mean by "uAPI"? Are you referring to the interface between the device and the kernel driver?

--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.