Re: [PATCH 1/4] iommu/amd: Introduce Protection-domain flag VFIO

From: Jason Gunthorpe
Date: Fri Jan 20 2023 - 19:09:11 EST


On Fri, Jan 20, 2023 at 04:42:26PM -0600, Tom Lendacky wrote:
> On 1/20/23 13:55, Kalra, Ashish wrote:
> > On 1/20/2023 11:50 AM, Jason Gunthorpe wrote:
> > > On Fri, Jan 20, 2023 at 11:01:21AM -0600, Kalra, Ashish wrote:
> > >
> > > > We basically get the RMP #PF from the IOMMU because there is a page size
> > > > mismatch between the RMP table and the IOMMU page table. The RMP table's
> > > > large page entry has been smashed to 4K PTEs to handle page
> > > > state change to
> > > > shared on 4K mappings, so this change has to be synced up with the IOMMU
> > > > page table, otherwise there is now a page size mismatch between RMP table
> > > > and IOMMU page table which causes the RMP #PF.
> > >
> > > I understand that, you haven't answered my question:
> > >
> > > Why is the IOMMU being programmed with pages it cannot access in the
> > > first place?
> > >
> >
> > I believe the IOMMU page tables are setup as part of device pass-through
> > to be able to do DMA to all of the guest memory, but i am not an IOMMU
> > expert, so i will let Suravee elaborate on this.
>
> Right. And what I believe Jason is saying is, that for SNP, since we know we
> can only DMA into shared pages, there is no need to setup the initial IOMMU
> page tables for all of guest memory. Instead, wait and set up IOMMU mappings
> when we do a page state change to shared and remove any mappings when we do
> a page state change to private.

Correct.

I don't know the details of how the shared/private works on AMD, eg if
the hypervisor even knows of the private/shared transformation..

At the very worst I suppose if you just turn on the vIOMMU it should
just start working as the vIOMMU mode should make the paging
dynamic. eg virtio-iommu or something general might even do the job.

Pinning pages is expensive, breaks swap, defragmentation and wastes a
lot of iommu memory. Given that the bounce buffers really shouldn't be
reallocated constantly I'd expect vIOMMU to be performance OK.

This solves the page size mistmatch issue because the iommu never has
a PFN installed that would generate a RMP #PF. The iommu can continue
to use large page sizes whenever possible.

It seems to me the current approach of just stuffing all memory into
the iommu is a just a shortcut to get something working.

Otherwise, what I said before is still the case: only the VMM knows
what it is doing. It knows if it is using a model where it programs
private memory into the IOMMU so it knows if it should ask the kernel
to use a 4k iommu page size.

Trying to have the kernel guess what userspace is doing based on the
kvm is simply architecturally wrong.

Jason