Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY

From: Alex Williamson
Date: Thu Mar 03 2022 - 10:20:53 EST


On Thu, 3 Mar 2022 09:01:24 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Wed, Mar 02, 2022 at 08:47:52PM -0700, Alex Williamson wrote:
> > On Wed, 2 Mar 2022 20:05:28 -0400
> > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > > On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:
> > > > > + * initial_bytes reflects the estimated remaining size of any initial mandatory
> > > > > + * precopy data transfer. When initial_bytes returns as zero then the initial
> > > > > + * phase of the precopy data is completed. Generally initial_bytes should start
> > > > > + * out as approximately the entire device state.
> > > >
> > > > What is "mandatory" intended to mean here? The user isn't required to
> > > > collect any data from the device in the PRE_COPY states.
> > >
> > > If the data is split into initial,dirty,trailer then mandatory means
> > > that first chunk.
> >
> > But there's no requirement to read anything in PRE_COPY, so initial
> > becomes indistinguishable from trailer and dirty doesn't exist.
>
> It is still mandatory to read that data out, it doesn't matter if it
> is read during PRE_COPY or STOP_COPY.

Not really, PRE_COPY -> RUNNING is a valid arc.

> > > > "The vfio_precopy_info data structure returned by this ioctl provides
> > > > estimates of data available from the device during the PRE_COPY states.
> > > > This estimate is split into two categories, initial_bytes and
> > > > dirty_bytes.
> > > >
> > > > The initial_bytes field indicates the amount of static data available
> > > > from the device. This field should have a non-zero initial value and
> > > > decrease as migration data is read from the device.
> > >
> > > static isn't great either, how about just say 'minimum data available'
> >
> > 'initial precopy data-set'?
>
> Sure
>
> > We have no basis to make that assertion. We've agreed that precopy can
> > be used for nothing more than a compatibility test, so we could have a
> > vGPU with a massive framebuffer and no ability to provide dirty
> > tracking implement precopy only to include the entire framebuffer in
> > the trailing STOP_COPY data set. Per my understanding and the fact
> > that we cannot enforce any heuristics regarding the size of the tailer
> > relative to the pre-copy data set, I think the above strongly phrased
> > sentence is necessary to understand the limitations of what this ioctl
> > is meant to convey. Thanks,
>
> This is why abusing precopy for compatability is not a great idea. It
> is OK for acc because its total state is tiny, but I would not agree
> to a vGPU driver being merged working like you describe. It distorts
> the entire purpose of PRE_COPY and this whole estimation mechanism.
>
> The ioctl is intended to convey when to switch to STOP_COPY, and the
> driver should provide a semantic where the closer the reported length
> is to 0 then the faster the STOP_COPY will go.

If it's an abuse, then let's not do it. It was never my impression or
intention that this was ok for acc only due to the minimal trailing
data size. My statement was that use of PRE_COPY for compatibility
testing only had been a previously agreed valid use case of the
original migration interface.

Furthermore the acc driver was explicitly directed not to indicate any
degree of trailing data size in dirty_bytes, so while trailing data may
be small for acc, this interface is explicitly not intended to provide
any indication of trailing data size. Thanks,

Alex