Re: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API

From: Eric Auger
Date: Thu Dec 09 2021 - 04:44:13 EST


Hi Kevin,

On 12/9/21 4:21 AM, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
>> Sent: Wednesday, December 8, 2021 8:56 PM
>>
>> On Wed, Dec 08, 2021 at 08:33:33AM +0100, Eric Auger wrote:
>>> Hi Baolu,
>>>
>>> On 12/8/21 3:44 AM, Lu Baolu wrote:
>>>> Hi Eric,
>>>>
>>>> On 12/7/21 6:22 PM, Eric Auger wrote:
>>>>> On 12/6/21 11:48 AM, Joerg Roedel wrote:
>>>>>> On Wed, Oct 27, 2021 at 12:44:20PM +0200, Eric Auger wrote:
>>>>>>> Signed-off-by: Jean-Philippe Brucker<jean-philippe.brucker@xxxxxxx>
>>>>>>> Signed-off-by: Liu, Yi L<yi.l.liu@xxxxxxxxxxxxxxx>
>>>>>>> Signed-off-by: Ashok Raj<ashok.raj@xxxxxxxxx>
>>>>>>> Signed-off-by: Jacob Pan<jacob.jun.pan@xxxxxxxxxxxxxxx>
>>>>>>> Signed-off-by: Eric Auger<eric.auger@xxxxxxxxxx>
>>>>>> This Signed-of-by chain looks dubious, you are the author but the last
>>>>>> one in the chain?
>>>>> The 1st RFC in Aug 2018
>>>>> (https://lists.cs.columbia.edu/pipermail/kvmarm/2018-
>> August/032478.html)
>>>>> said this was a generalization of Jacob's patch
>>>>>
>>>>>
>>>>>    [PATCH v5 01/23] iommu: introduce bind_pasid_table API function
>>>>>
>>>>>
>>>>>
>>>>> https://lists.linuxfoundation.org/pipermail/iommu/2018-
>> May/027647.html
>>>>> So indeed Jacob should be the author. I guess the multiple rebases got
>>>>> this eventually replaced at some point, which is not an excuse. Please
>>>>> forgive me for that.
>>>>> Now the original patch already had this list of SoB so I don't know if I
>>>>> shall simplify it.
>>>> As we have decided to move the nested mode (dual stages)
>> implementation
>>>> onto the developing iommufd framework, what's the value of adding this
>>>> into iommu core?
>>> The iommu_uapi_attach_pasid_table uapi should disappear indeed as it is
>>> is bound to be replaced by /dev/iommu fellow API.
>>> However until I can rebase on /dev/iommu code I am obliged to keep it to
>>> maintain this integration, hence the RFC.
>> Indeed, we are getting pretty close to having the base iommufd that we
>> can start adding stuff like this into. Maybe in January, you can look
>> at some parts of what is evolving here:
>>
>> https://github.com/jgunthorpe/linux/commits/iommufd
>> https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-
>> v2
>> https://github.com/luxis1999/iommufd/commits/iommufd-v5.16-rc2
>>
>> From a progress perspective I would like to start with simple 'page
>> tables in userspace', ie no PASID in this step.
>>
>> 'page tables in userspace' means an iommufd ioctl to create an
>> iommu_domain where the IOMMU HW is directly travesering a
>> device-specific page table structure in user space memory. All the HW
>> today implements this by using another iommu_domain to allow the IOMMU
>> HW DMA access to user memory - ie nesting or multi-stage or whatever.
> One clarification here in case people may get confused based on the
> current iommu_domain definition. Jason brainstormed with us on how
> to represent 'user page table' in the IOMMU sub-system. One is to
> extend iommu_domain as a general representation for any page table
> instance. The other option is to create new representations for user
> page tables and then link them under existing iommu_domain.
>
> This context is based on the 1st option. and As Jason said in the bottom
> we still need to sketch out whether it works as expected. 😊
>
>> This would come along with some ioctls to invalidate the IOTLB.
>>
>> I'm imagining this step as a iommu_group->op->create_user_domain()
>> driver callback which will create a new kind of domain with
>> domain-unique ops. Ie map/unmap related should all be NULL as those
>> are impossible operations.
>>
>> From there the usual struct device (ie RID) attach/detatch stuff needs
>> to take care of routing DMAs to this iommu_domain.
> Usage-wise this covers the guest IOVA requirements i.e. when the guest
> kernel enables vIOMMU for kernel DMA-API mappings or for device
> assignment to guest userspace.
>
> For intel this means optimization to the existing shadow-based vIOMMU
> implementation.
>
> For ARM this actually enables guest IOVA usage for the first time (correct
> me Eric?).
Yes that's correct. This is the scope of this series (single PASID)
> IIRC SMMU doesn't support caching mode while write-protecting
> guest I/O page table was considered a no-go. So nesting is considered as
> the only option to support that.
that's correct too. No 'caching mode' provisionned in the SMMU spec. I
thought it would just be a matter of adding 1 bit in an ID reg though.

Thanks

Eric
>
> and once 'user pasid table' is installed, this actually means guest SVA usage
> can also partially work for ARM if I/O page fault is not incurred.
>
>> Step two would be to add the ability for an iommufd using driver to
>> request that a RID&PASID is connected to an iommu_domain. This
>> connection can be requested for any kind of iommu_domain, kernel owned
>> or user owned.
>>
>> I don't quite have an answer how exactly the SMMUv3 vs Intel
>> difference in PASID routing should be resolved.
> For kernel owned the iommufd interface should be generic as the
> vendor difference is managed by the kernel itself.
>
> For user owned we'll need new uAPIs for user to specify PASID.
> As I replied in another thread only Intel currently requires it due to
> mdev. But other vendors could also do so when they decide to
> support mdev one day.
>
>> to get answers I'm hoping to start building some sketch RFCs for these
>> different things on iommufd, hopefully in January. I'm looking at user
>> page tables, PASID, dirty tracking and userspace IO fault handling as
>> the main features iommufd must tackle.
> Make sense.
>
>> The purpose of the sketches would be to validate that the HW features
>> we want to exposed can work will with the choices the base is making.
>>
>> Jason
> Thanks
> Kevin