Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices

From: Robin Murphy
Date: Thu Jan 23 2020 - 09:55:47 EST


On 22/01/2020 5:39 am, Lu Baolu wrote:
Hi Robin,

On 1/21/20 8:45 PM, Robin Murphy wrote:
On 19/01/2020 6:29 am, Lu Baolu wrote:
Hi Joerg,

On 1/17/20 6:21 PM, Joerg Roedel wrote:
On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
This splits iommu group allocation from adding devices. This makes
it possible to determine the default domain type for each group as
all devices belonging to the group have been determined.

I think its better to keep group allocation as it is and just defer
default domain allocation after each device is in its group. But take

I tried defering default domain allocation, but it seems not possible.

The call path of adding devices into their groups:

iommu_probe_device
-> ops->add_device(dev)
ÂÂÂ -> (iommu vendor driver) iommu_group_get_for_dev(dev)

After doing this, the vendor driver will get the default domain and
apply dma_ops according to the domain type. If we defer the domain
allocation, they will get a NULL default domain and cause panic in
the vendor driver.

Any suggestions?

https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0c50@xxxxxxxxxxxxxxx/

Haven't we been here before? ;)

Since we can't (safely or reasonably) change a group's default domain after ops->add_device() has returned, and in general it gets impractical to evaluate "all device in a group" once you look beyond &pci_bus_type (or consider hotplug as mentioned), then AFAICS there's no reasonable way to get away from the default domain type being defined by the first device to attach.

Yes, agreed.

But in practice it's hardly a problem anyway - if every device in a given group requests the same domain type then it doesn't matter which comes first, and if they don't then we ultimately end up with an impossible set of constraints, so are doomed to do the 'wrong' thing regardless.

The third case is, for example, three devices A, B, C in a group. The
first device A is neutral about which type of default domain type is
used. So the iommu framework will use a static default domain. But the
device B requires to use a specific one which is different from the
default. Currently, this is handled in the vendor iommu driver and one
motivation of this patch set is to handle this in the generic layer.

Yes, I wasn't explicitly considering that particular case, but it mostly falls out more or less the same way. Given that multi-device groups *should* be relatively rare, for the user override it seems reasonable to expect the user to see when devices get grouped and specify all of them to achieve the desired result; the trusted/untrusted attribute definitely shouldn't differ within any given group; and opportunistically replacing passthrough domains with translation domains for DMA-limited devices can only ever be a best-effort thing without consistent results, since at best that still comes down to which driver probed and called dma_set_mask() first.

Platform-specific exceptions like in device_def_domain_type() probably do want to stay in the individual drivers, but rolling that up into default domain allocation would be neat, and functionally no worse than the existing process.

In principle we could fairly easily delay allocating a group's default domain until the first driver bind event. It wouldn't help universally - in the absolute worst case, device B might only be created at all by device A's driver probing - and it might need careful coordination in areas like the bus->dma_configure() flow, but it could at least help accommodate the more common PCI case.

Robin.