Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was notpassed in on > 4GB, don't turn it on.

From: Konrad Rzeszutek Wilk
Date: Mon Jul 30 2012 - 11:20:04 EST


On Mon, Jul 30, 2012 at 03:58:02PM +0100, Stefano Stabellini wrote:
> On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
> > > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > > > gets turned on:
> > > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [ffff8800fb43d000-ffff8800ff43cfff]
> > > >
> > > > which is OK if we had PCI devices, but not if we did not. In a PV
> > > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > > > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > > > amount of guests that are 4GB to start due to lowmem exhaustion.
> > > >
> > > > What we do is detect whether the user supplied e820_hole=1
> > > > parameter, which is used to construct an E820 that is similar to
> > > > the machine - so that the PCI regions do not overlap with RAM regions.
> > > > We check for that by looking at the E820 and seeing if it diverges
> > > > from the standard - and if so (and if iommu=soft was not turned on),
> > > > we disable the check pci_swiotlb_detect_4gb code.
> > >
> > > What kind of paramter is it?
> > > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?
> >
> > Its a guest config option.
>
> Is this option turned on by default if the VM config file contains one
> or more PCI devices statically assigned to the VM?

I think we debated it at some point but never came to agreement. I did
showed that it would not negativly impact older guests - except that
they would lose some big swaths of memory (they don't do the release
memory pages for E820 I/O regions).
>
> If this option is not specified, is it going to be impossible to
> dynamically passthrough a PCI devices after the VM is booted?

Well, so I thought about this over the weekend and cooked up some new
patches that turn Xen-SWIOTLB on (if it hasn't been turned on) when
Xen PCI detectes that there are some dvices to be passed in. Testing it now.

>
>
> > > Surely there must be a better way to let Linux know if this paramter has
> > > been turned on than looking for ACPI entries in the E820.
> >
> > I am all open for suggestions. The best way I can think of is to have
> > some early_init variant of XenBus-detect-this-backend-parameter. Can
> > one unhook an "old" XenBus and reset with the full-fledged XenBus
> > init later on?
>
> Assuming that the xen swiotlb is only useful for PCI passthrough devices
> in PV guests, we could write few wrappers for the current xen_swiotlb
> functions like this:
>
> xen_swiotlb_alloc_coherent_new(..)
> {
> if (xen_initial_domain() || (xen_pv_domain() && a_pci_device_is_assigned()))
> xen_swiotlb_alloc_coherent();
> else
> return __get_free_pages();
> }
>
> do you think it would work?
> This way it would be far more flexible.

So I had a brain-fart when I wrote these patches. When a PV guest is booted
with more than 4GB, the SWIOTLB that gets turned on is the *native* one.
Not the XenSWIOTLB. The impact is that we dont' do any of the swizzle of memory
below 4GB, but instead jus end up wasting 64MB in a PV guest.

The fix for that is actually pretty simple: