Re: [PATCH] vfio iommu type1: Bypass the vma permission check in vfio_pin_pages_remote()

From: Peter Xu
Date: Thu Dec 03 2020 - 10:45:23 EST


On Thu, Dec 03, 2020 at 11:20:02AM +0000, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2020 at 10:45:11AM -0500, Peter Xu wrote:
> > On Wed, Dec 02, 2020 at 02:33:56PM +0000, Stefan Hajnoczi wrote:
> > > On Wed, Nov 25, 2020 at 10:57:11AM -0500, Peter Xu wrote:
> > > > On Wed, Nov 25, 2020 at 01:05:25AM +0000, Justin He wrote:
> > > > > > I'd appreciate if you could explain why vfio needs to dma map some
> > > > > > PROT_NONE
> > > > >
> > > > > Virtiofs will map a PROT_NONE cache window region firstly, then remap the sub
> > > > > region of that cache window with read or write permission. I guess this might
> > > > > be an security concern. Just CC virtiofs expert Stefan to answer it more accurately.
> > > >
> > > > Yep. Since my previous sentence was cut off, I'll rephrase: I was thinking
> > > > whether qemu can do vfio maps only until it remaps the PROT_NONE regions into
> > > > PROT_READ|PROT_WRITE ones, rather than trying to map dma pages upon PROT_NONE.
> > >
> > > Userspace processes sometimes use PROT_NONE to reserve virtual address
> > > space. That way future mmap(NULL, ...) calls will not accidentally
> > > allocate an address from the reserved range.
> > >
> > > virtio-fs needs to do this because the DAX window mappings change at
> > > runtime. Initially the entire DAX window is just reserved using
> > > PROT_NONE. When it's time to mmap a portion of a file into the DAX
> > > window an mmap(fixed_addr, ...) call will be made.
> >
> > Yes I can understand the rational on why the region is reserved. However IMHO
> > the real question is why such reservation behavior should affect qemu memory
> > layout, and even further to VFIO mappings.
> >
> > Note that PROT_NONE should likely mean that there's no backing page at all in
> > this case. Since vfio will pin all the pages before mapping the DMAs, it also
> > means that it's at least inefficient, because when we try to map all the
> > PROT_NONE pages we'll try to fault in every single page of it, even if they may
> > not ever be used.
> >
> > So I still think this patch is not doing the right thing. Instead we should
> > somehow teach qemu that the virtiofs memory region should only be the size of
> > enabled regions (with PROT_READ|PROT_WRITE), rather than the whole reserved
> > PROT_NONE region.
>
> virtio-fs was not implemented with IOMMUs in mind. The idea is just to
> install a kvm.ko memory region that exposes the DAX window.
>
> Perhaps we need to treat the DAX window like an IOMMU? That way the
> virtio-fs code can send map/unmap notifications and hw/vfio/ can
> propagate them to the host kernel.

Sounds right. One more thing to mention is that we may need to avoid tearing
down the whole old DMA region when resizing the PROT_READ|PROT_WRITE region
into e.g. a bigger one to cover some of the previusly PROT_NONE part, as long
as if the before-resizing region is still possible to be accessed from any
hardware. It smells like something David is working with virtio-mem, not sure
whether there's any common infrastructure that could be shared.

Thanks,

--
Peter Xu