Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

From: Roman Kagan
Date: Fri Mar 04 2016 - 05:24:18 EST


On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote:
> > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > > > I wonder if it would be possible to avoid the kernel changes by
> > > > parsing /proc/self/pagemap - if that can be used to detect
> > > > unmapped/zero mapped pages in the guest ram, would it achieve the
> > same result?
> > >
> > > Only detect the unmapped/zero mapped pages is not enough. Consider
> > the
> > > situation like case 2, it can't achieve the same result.
> >
> > Your case 2 doesn't exist in the real world. If people could stop their main
> > memory consumer in the guest prior to migration they wouldn't need live
> > migration at all.
>
> The case 2 is just a simplified scenario, not a real case.
> As long as the guest's memory usage does not keep increasing, or not always run out,
> it can be covered by the case 2.

The memory usage will keep increasing due to ever growing caches, etc,
so you'll be left with very little free memory fairly soon.

> > I tend to think you can safely assume there's no free memory in the guest, so
> > there's little point optimizing for it.
>
> If this is true, we should not inflate the balloon either.

We certainly should if there's "available" memory, i.e. not free but
cheap to reclaim.

> > OTOH it makes perfect sense optimizing for the unmapped memory that's
> > made up, in particular, by the ballon, and consider inflating the balloon right
> > before migration unless you already maintain it at the optimal size for other
> > reasons (like e.g. a global resource manager optimizing the VM density).
> >
>
> Yes, I believe the current balloon works and it's simple. Do you take the performance impact for consideration?
> For and 8G guest, it takes about 5s to inflating the balloon. But it only takes 20ms to traverse the free_list and
> construct the free pages bitmap.

I don't have any feeling of how important the difference is. And if the
limiting factor for balloon inflation speed is the granularity of
communication it may be worth optimizing that, because quick balloon
reaction may be important in certain resource management scenarios.

> By inflating the balloon, all the guest's pages are still be processed (zero page checking).

Not sure what you mean. If you describe the current state of affairs
that's exactly the suggested optimization point: skip unmapped pages.

> The only advantage of ' inflating the balloon before live migration' is simple, nothing more.

That's a big advantage. Another one is that it does something useful in
real-world scenarios.

Roman.