Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity

From: Max Gurtovoy
Date: Thu Sep 30 2021 - 05:34:33 EST



On 9/30/2021 2:21 AM, Jason Gunthorpe wrote:
On Thu, Sep 30, 2021 at 12:48:55AM +0300, Max Gurtovoy wrote:
On 9/29/2021 7:14 PM, Jason Gunthorpe wrote:
On Wed, Sep 29, 2021 at 06:28:44PM +0300, Max Gurtovoy wrote:

So you have a device that's actively modifying its internal state,
performing I/O, including DMA (thereby dirtying VM memory), all while
in the _STOP state? And you don't see this as a problem?
I don't see how is it different from vfio-pci situation.
vfio-pci provides no way to observe the migration state. It isn't
"000b"
Alex said that there is a problem of compatibility.
Yes, when a vfio_device first opens it must be running - ie able to do
DMA and otherwise operational.

how can non resumed device do DMA ?

Also the bus master is not set.


When we add the migration extension this cannot change, so after
open_device() the device should be operational.

if it's waiting for incoming migration blob, it is not running.


The reported state in the migration region should accurately reflect
what the device is currently doing. If the device is operational then
it must report running, not stopped.

STOP in migration meaning.


Thus a driver cannot just zero initalize the migration "registers",
they have to be accurate.

Maybe we need to rename STOP state. We can call it READY or LIVE or
NON_MIGRATION_STATE.
It was a poor choice to use 000b as stop, but it doesn't really
matter. The mlx5 driver should just pre-init this readable to running.
I guess we can do it for this reason. There is no functional problem nor
compatibility issue here as was mentioned.

But still we need the kernel to track transitions. We don't want to allow
moving from RESUMING to SAVING state for example. How this transition can be
allowed ?
It seems semantically fine to me, as per Alex's note what will happen
is defined:

driver will see RESUMING toggle off so it will trigger a
de-serialization

You mean stop serialization ?


driver will see SAVING toggled on so it will serialize the new state
(either the pre-copy state or the post-copy state dpending on the
running bit)

lets leave the bits and how you implement the state numbering aside.

If you finish resuming you can move to a new state (that we should add) => RESUMED.

Now you suggested moving from RESUMED to SAVING to get the state again from the dst device ? and send it back to src ? before staring the VM and moving to RUNNING ?

where this is coming from ?


Depending on the running bit the device may or may not be woken up.

lets take about logic here and not bits.


If de-serialization fails then the state goes to error and SAVING is
ignored.

The driver logic probably looks something like this:

// Running toggles off
if (oldstate & RUNNING != newstate & RUNNING && oldstate & RUNNING)
queice
freeze

// Resuming toggles off
if (oldstate & RESUMING != newstate & RESUMING && oldstate & RESUMING)
deserialize

// Saving toggles on
if (oldstate & SAVING != newstate & SAVING && newstate & SAVING)
if (!(newstate & RUNNING))
serialize post copy

// Running toggles on
if (oldstate & RUNNING != newstate & RUNNING && newstate & RUNNING)
unfreeze
unqueice

I'd have to check that carefully against the state chart from my last
email though..

And need to check how the "Stop Active Transactions" bit fits in there

Jason