Re: [RFC PATCH kernel vfio] mm: vfio: Move pages out of CMA before pinning

From: Benjamin Herrenschmidt
Date: Mon Aug 17 2015 - 05:58:58 EST


On Mon, 2015-08-17 at 19:11 +1000, Alexey Kardashevskiy wrote:
> On 08/17/2015 05:45 PM, Vlastimil Babka wrote:
> > On 08/05/2015 10:08 AM, Alexey Kardashevskiy wrote:
> > > This is about VFIO aka PCI passthrough used from QEMU.
> > > KVM is irrelevant here.
> > >
> > > QEMU is a machine emulator. It allocates guest RAM from anonymous
> > > memory
> > > and these pages are movable which is ok. They may happen to be
> > > allocated
> > > from the contiguous memory allocation zone (CMA). Which is also
> > > ok as
> > > long they are movable.
> > >
> > > However if the guest starts using VFIO (which can be hotplugged
> > > into
> > > the guest), in most cases it involves DMA which requires guest
> > > RAM pages
> > > to be pinned and not move once their addresses are programmed to
> > > the hardware for DMA.
> > >
> > > So we end up in a situation when quite many pages in CMA are not
> > > movable
> > > anymore. And we get bunch of these:
> > >
> > > [77306.513966] alloc_contig_range: [1f3800, 1f78c4) PFNs busy
> > > [77306.514448] alloc_contig_range: [1f3800, 1f78c8) PFNs busy
> > > [77306.514927] alloc_contig_range: [1f3800, 1f78cc) PFNs busy
> >
> > IIRC CMA was for mobile devices and their camera/codec drivers and
> > you
> > don't use QEMU on those? What do you need CMA for in your case?
>
> I do not want QEMU to get memory from CMA, this is my point. It just
> happens sometime that the kernel allocates movable pages from there.

You may want to explain why we have a CMA in the first place.... our
KVM implementation needs to allocate large chunks of physically
contiguous memory for each guest in order to contain the MMU hash table
for those guests.

We use a CMA whose size can be specified at boot but is generally a
pecentile of the total system memory to allocate these from.

However we don't want normal allocations that we *know* are going to be
pinned to be in that CMA, otherwise they would defeat its purpose, so
this patch is about moving stuff that we are about to pin out of the
CMA first.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/