Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Kenneth Lee
Date: Fri Nov 23 2018 - 23:12:04 EST


On Fri, Nov 23, 2018 at 11:05:04AM -0700, Jason Gunthorpe wrote:
> Date: Fri, 23 Nov 2018 11:05:04 -0700
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> To: Kenneth Lee <liguozhu@xxxxxxxxxxxxx>
> CC: Leon Romanovsky <leon@xxxxxxxxxx>, Kenneth Lee <nek.in.cn@xxxxxxxxx>,
> Tim Sell <timothy.sell@xxxxxxxxxx>, linux-doc@xxxxxxxxxxxxxxx, Alexander
> Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>, Zaibo Xu
> <xuzaibo@xxxxxxxxxx>, zhangfei.gao@xxxxxxxxxxx, linuxarm@xxxxxxxxxx,
> haojian.zhuang@xxxxxxxxxx, Christoph Lameter <cl@xxxxxxxxx>, Hao Fang
> <fanghao11@xxxxxxxxxx>, Gavin Schenk <g.schenk@xxxxxxxxxxxx>, RDMA mailing
> list <linux-rdma@xxxxxxxxxxxxxxx>, Zhou Wang <wangzhou1@xxxxxxxxxxxxx>,
> Doug Ledford <dledford@xxxxxxxxxx>, Uwe Kleine-KÃnig
> <u.kleine-koenig@xxxxxxxxxxxxxx>, David Kershner
> <david.kershner@xxxxxxxxxx>, Johan Hovold <johan@xxxxxxxxxx>, Cyrille
> Pitchen <cyrille.pitchen@xxxxxxxxxxxxxxxxxx>, Sagar Dharia
> <sdharia@xxxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>,
> guodong.xu@xxxxxxxxxx, linux-netdev <netdev@xxxxxxxxxxxxxxx>, Randy Dunlap
> <rdunlap@xxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Vinod Koul
> <vkoul@xxxxxxxxxx>, linux-crypto@xxxxxxxxxxxxxxx, Philippe Ombredanne
> <pombredanne@xxxxxxxx>, Sanyog Kale <sanyog.r.kale@xxxxxxxxx>, "David S.
> Miller" <davem@xxxxxxxxxxxxx>, linux-accelerators@xxxxxxxxxxxxxxxx
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <20181123180504.GA3395@xxxxxxxx>
>
> On Fri, Nov 23, 2018 at 04:02:42PM +0800, Kenneth Lee wrote:
>
> > It is already part of Jean's patchset. And that's why I built my solution on
> > VFIO in the first place. But I think the concept of SVA and PASID is not
> > compatible with the original VFIO concept space. You would not share your whole
> > address space to a device at all in a virtual machine manager,
> > wouldn't you?
>
> Why not? That seems to fit VFIO's space just fine to me.. You might
> need a new upcall to create a full MM registration, but that doesn't
> seem unsuited.

Because the VM manager (such as qemu) do not want to share its whole space to
the device. It is a security problem.

>
> Part of the point here is you should try to make sensible revisions to
> existing subsystems before just inventing a new thing...
>
> VFIO is deeply connected to the IOMMU, so enabling more general IOMMU
> based approache seems perfectly fine to me..
>
> > > Once the VFIO driver knows about this as a generic capability then the
> > > device it exposes to userspace would use CPU addresses instead of DMA
> > > addresses.
> > >
> > > The question is if your driver needs much more than the device
> > > agnostic generic services VFIO provides.
> > >
> > > I'm not sure what you have in mind with resource management.. It is
> > > hard to revoke resources from userspace, unless you are doing
> > > kernel syscalls, but then why do all this?
> >
> > Say, I have 1024 queues in my accelerator. I can get one by opening the device
> > and attach it with the fd. If the process exit by any means, the queue can be
> > returned with the release of the fd. But if it is mdev, it will still be there
> > and some one should tell the allocator it is available again. This is not easy
> > to design in user space.
>
> ?? why wouldn't the mdev track the queues assigned using the existing
> open/close/ioctl callbacks?
>
> That is basic flow I would expect:
>
> open(/dev/vfio)
> ioctl(unity map entire process MM to mdev with IOMMU)
>
> // Create a HQ queue and link the PASID in the HW to this HW queue
> struct hw queue[..];
> ioctl(create HW queue)
>
> // Get BAR doorbell memory for the queue
> bar = mmap()
>
> // Submit work to the queue using CPU addresses
> queue[0] = ...
> writel(bar [..], &queue);
>
> // Queue, SVA, etc is cleaned up when the VFIO closes
> close()

This is not the way that you can use mdev. To use mdev, you have to:

1. unbind kernel driver from the device, and rebind it to vfio driver
2. for 0 to 1204: uuid > /sys/.../the_dev/mdev/create to create all the mdev
3. a virtual iommu_group will be created in /dev/vfio/* from every mdev

now you can do this in you application (even without considering the pasid) :

container = open(/dev/vfio);
ioctl(container, settting);
group = open(/dev/vfio/my_group_for_particular_mdev);
ioctl(container, attach_group, group);
device = ioctl(group, get_device);
mmap(device);
ioctl(container, set_dma_operation);

Then you have to make a decision, how can you find a available mdev for use and
how to return it.

We have considered creating only one mdev and allocating queue when the device
is openned. But the VFIO maintainer, Alex, did not agree and said it broke the
VFIO origin idea.

-Kenneth
>
> Presumably the kernel has to handle the PASID and related for security
> reasons, so they shouldn't go to userspace?
>
> If there is something missing in vfio to do this is it looks pretty
> small to me..
>
> Jason

--
-Kenneth(Hisilicon)

================================================================================
æéäååéäåæåäååçäåäæïäéäåéçäéååäååçääæççãç
æääåäääääååäçïåæääéäåéæéååæéãååãææåïæéää
çäæãåææéæäæéäïèæçåçèæéäéçåääååéæéäï
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!