Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

From: Jason Gunthorpe
Date: Thu Sep 30 2021 - 18:24:02 EST


On Thu, Sep 30, 2021 at 09:35:45AM +0000, Tian, Kevin wrote:

> > The Intel functional issue is that Intel blocks the cache maintaince
> > ops from the VM and the VM has no way to self-discover that the cache
> > maintaince ops don't work.
>
> the VM doesn't need to know whether the maintenance ops
> actually works.

Which is the whole problem.

Intel has a design where the device driver tells the device to issue
non-cachable TLPs.

The driver is supposed to know if it can issue the cache maintaince
instructions - if it can then it should ask the device to issue
no-snoop TLPs.

For instance the same PCI driver on non-x86 should never ask the
device to issue no-snoop TLPs because it has no idea how to restore
cache coherence on eg ARM.

Do you see the issue? This configuration where the hypervisor silently
make wbsync a NOP breaks the x86 architecture because the guest has no
idea it can no longer use no-snoop features.

Using the IOMMU to forcibly prevent the device from issuing no-snoop
makes this whole issue of the broken wbsync moot.

It is important to be really clear on what this is about - this is not
some idealized nice iommu feature - it is working around alot of
backwards compatability baggage that is probably completely unique to
x86.

> > Other arches don't seem to have this specific problem...
>
> I think the key is whether other archs allow driver to decide DMA
> coherency and indirectly the underlying I/O page table format.
> If yes, then I don't see a reason why such decision should not be
> given to userspace for passthrough case.

The choice all comes down to if the other arches have cache
maintenance instructions in the VM that *don't work*

Jason