Re: [PATCH] vfio/pci: Support error recovery

From: Michael S. Tsirkin
Date: Mon Dec 05 2016 - 22:55:37 EST


On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
> If you're going to take the lead for these AER patches, I would
> certainly suggest that understanding the reasoning behind the bus reset
> behavior is a central aspect to this series. This effort has dragged
> out for nearly two years and I apologize, but I don't really have a lot
> of patience for rehashing some of these issues if you're not going to
> read the previous discussions or consult with your colleagues to
> understand how we got to this point. If you want to challenge some of
> the design points, that's great, it could use some new eyes, but please
> understand how we got here first.

Well I'm guessing Cao jin here isn't the only one not
willing to plough through all historical versions of the patchset
just to figure out the motivation for some code.

Including a summary of a high level architecture couldn't hurt.

Any chance of writing such? Alternatively, we can try to build it as
part of this thread. Shouldn't be hard as it seems somewhat
straight-forward on the surface:

- detect link error on the host, don't reset link as we would normally do
- report link error to guest
- detect link reset request from guest
- reset link on host

Since link reset will reset all devices behind it, for this to work we
need same set of devices behind the link in host and guest. Enforcing
this would be nice to have.

- as link now might end up in bad state, reset
it when device is unassigned

Any details I missed?

--
MST