Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

From: Jacob Pan
Date: Wed May 30 2018 - 15:52:57 EST


On Wed, 30 May 2018 12:53:53 +0100
Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> wrote:

> On 30/05/18 04:45, Tian, Kevin wrote:
> >>>>>> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so
> >>>>>> a
> >>>> guest
> >>>>>> under a vSMMU might pass a pointer that's not aligned on 4k.
> >>>>>>
> >>>>> PASID table pointer for VT-d is 4K aligned.
> >>>>>> Maybe this information could be part of the data passed to
> >> userspace
> >>>>>> about IOMMU table formats and features? They're not part of
> >>>>>> this series, but I think we wanted to communicate
> >>>>>> IOMMU-specific
> >> features
> >>>>>> via sysfs.
> >>>>>>
> >>>>> Agreed, I believe Yi Liu is working on a sysfs interface such
> >>>>> that QEMU can match IOMMU model and features.
> >>>>
> >>>> Digging this up again since v5 still has this issue. The IOMMU
> >>>> API is a kernel internal abstraction of the IOMMU. sysfs is a
> >>>> userspace interface. Are we suggesting that the /only/ way to
> >>>> make use of the internal IOMMU API here is to have a user
> >>>> provided opaque pasid table that we can't even do minimal
> >>>> compatibility sanity testing on and we simply hope that hardware
> >>>> covers all the fault conditions without taking the host down
> >>>> with it? I guess we have to assume the latter since the user
> >>>> has full control of the table, but I have a hard time getting
> >>>> past lack of internal ability to use the interface and no
> >>>> ability to provide even the slimmest sanity testing. Thanks,
> >>>
> >>> checking size, alignment, ... is OK, which I think is already
> >>> considered by vendor IOMMU driver. However sanity testing table
> >>> format might be difficult. The initial table provided by guest is
> >>> likely just all ZEROs. whatever format violation may be caught
> >>> only when a PASID entry is updated...
> >>
> >> There's sanity testing the actual contents of the table, which I
> >> agree would be difficult and would likely require some sort of
> >> shadowing at additional overhead, but what about even basic
> >> consistency checking? For example, is it possible that due to
> >> hardware variations a user might generate a table which works on
> >> some systems but not others? What
> >> if two table formats are sufficiently similar that the IOMMU driver
> >> puts an incompatible table in place but it continuously generates
> >> faults, how do we debug that? As an intermediary in this whole
> >> process I'd really rather be able to identify that the user claims
> >> to be providing a TypeA table but the IOMMU only supports TypeB,
> >> so clearly this won't work. I don't see that we have that
> >> capability. Thanks,
> >
> > I remember we ever discussed to define some vendor/model ID,
> > which can be retrieved by user space and then passed back when
> > doing table binding. Then above simple model matching check can
> > be done accordingly. It is actually a basic requirement when using
> > virtio-iommu, same driver expecting to work on all vendor IOMMUs.
> >
> > However I don't remember whether/where that logic is implemented
> > in this series (especially when there are two tracks moving in
> > parallel). I'll leave to Jacob/Jean to further comment.
>
> For Arm we do need some form of sanity checking. As each architecture
> version brings a new set of features that may be supported and enabled
> individually, we need to communicate fine-grained features to users.
> They describes the general capability of the physical IOMMU, and also
> which fields are available in the PASID table (entries are 512-bits
> and leave some space for future extensions).
>
> In the past I briefly tried using a ioctl-based interface through VFIO
> only, but it seemed more complicated to extend than sysfs for this
> kind of probing.
>
> Note that the following is from my own prototype. I'm not sure how
> much Yi Liu's implementation differs but I think this was roughly
> what we agreed on last time. In sysfs an IOMMU device is described
> with:
>
> * A model number, for example intel-vtd=1, arm-smmu-v3=2.
> * Properties and features, describing in detail what the pIOMMU device
> and driver support.
>
> /sys/class/iommu/<iommu-dev>/<model>/<property>
>
> For example an SMMUv3:
>
> The model number is described as a property
> /sys/class/iommu/smmu.0x00000000e0600000/arm-smmu-v3/model = 2
>
> A few feature bits and values:
> .../arm-smmu-v3/asid_bits // max address space ID bits, %d
> .../arm-smmu-v3/ssid_bits // max substream ID (PASID) bits, %d
> .../arm-smmu-v3/input_bits // max input address size, %d
> .../arm-smmu-v3/output_bits // max output address size, %d
> .../arm-smmu-v3/btm // broadcast TLB maintenance,
> enabled/disabled .../arm-smmu-v3/httu // Hardware
> table update,
> access+dirty/access/none .../arm-smmu-v3/stall //
> transaction stalling, enabled/disabled/force
>
> (Note that the base pointer alignment previously discussed could be
> implied by the model number, or added explicitly here.)
>
> Which page table formats are supported:
> .../arm-smmu-v3/pgtable_format/lpae-64
> .../arm-smmu-v3/pgtable_format/v7s
> I'm not sure yet what values these will have, they might simply
> contain arbitrary format numbers because fields available in the page
> tables can be deduced from the above features bits. (Out of laziness,
> in my prototype I just describe a preferred format in a
> pgtable_format file)
>
> As you can imagine I'd rather not pass the fine details back to the
> kernel in bind_pasid_table. The list of features is growing, and
> describing them is a pain. It could be done for debugging purpose, but
> all we'd be achieving is telling the kernel that userspace has read
> the values, not that the guest intends to use them. The guest selects
> features by writing PASID table entries, which aren't read by the
> host.
>
> If the guest writes invalid values in the PASID table then yes, we
> have to rely on the hardware to contain the fault and not bring the
> host down with it. If the IOMMU cannot do that, then the driver
> really shouldn't implement bind_pasid_table... Otherwise, a fault
> while reading the PASID table can be injected into the guest as an
> unrecoverable fault (IOMMU_FAULT_REASON_PASID_INVALID or
> IOMMU_FAULT_REASON_PGD_FETCH in patch 10) or printed by the host when
> debugging.
>
> However I think the model number should be added to
> pasid_table_config. For one thing it gives us a simple sanity-check,
> but it also tells which other fields are valid in pasid_table_config.
> Arm-smmu-v3 needs at least two additional 8-bit fields describing the
> PASID table format (number of levels and PASID0 behaviour), which are
> written to device context tables when installing the PASID table
> pointer.
>
We had model number field in v2 of this patchset. My thought was that
since the config info is meant to be generic, we shouldn't include
model info. But I also think a simple sanity check can be useful,
would that be sufficient to address Alex's concern? Of course we still
need sysfs for more specific IOMMU features.

Would this work?
enum pasid_table_model {
PASID_TABLE_FORMAT_HOST,
PASID_TABLE_FORMAT_ARM_1LVL,
PASID_TABLE_FORMAT_ARM_2LVL,
PASID_TABLE_FORMAT_AMD,
PASID_TABLE_FORMAT_INTEL,
};

/**
* PASID table data used to bind guest PASID table to the host IOMMU. This will
* enable guest managed first level page tables.
* @version: for future extensions and identification of the data format
* @bytes: size of this structure
* @model: PASID table format for different IOMMU models
* @base_ptr: PASID table pointer
* @pasid_bits: number of bits supported in the guest PASID table, must be less
* or equal than the host supported PASID size.
*/
struct pasid_table_config {
__u32 version;
#define PASID_TABLE_CFG_VERSION_1 1
__u32 bytes;
enum pasid_table_model model;
__u64 base_ptr;
__u8 pasid_bits;
};



> Compatibility: new optional features are easy to add to a given model,
> just add a new sysfs file. If in the future, the host describes a new
> feature that is mandatory, or implements a different PASID table
> format, how does it ensure that user understands it? Perhaps use a
> new model number for this, e.g. "arm-smmu-v3-a=3", with similar
> features. I think it would be the same if the host stops supporting a
> feature for a given model, because they are ABI. But we can also
> define default values from the start, for example "if ssid_bits file
> isn't present, default value is 0 - PASID not supported"
>
> Thanks,
> Jean

[Jacob Pan]