Re: [PATCH] vfio/pci: Support error recovery

From: Alex Williamson
Date: Mon Dec 05 2016 - 11:17:36 EST


On Mon, 5 Dec 2016 13:52:03 +0800
Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:

> On 12/04/2016 11:30 PM, Alex Williamson wrote:
> > On Sun, 4 Dec 2016 20:16:42 +0800
> > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
> >
> >> On 12/01/2016 10:55 PM, Alex Williamson wrote:
> >>> On Thu, 1 Dec 2016 21:40:00 +0800
> >>
> >>>>> If an AER fault occurs and the user doesn't do a reset, what
> >>>>> happens when that device is released and a host driver tries to make
> >>>>> use of it? The user makes no commitment to do a reset and there are
> >>>>> only limited configurations where we even allow the user to perform a
> >>>>> reset.
> >>>>>
> >>>>
> >>>> Limited? Do you mean the things __pci_dev_reset() can do?
> >>>
> >>> I mean that there are significant device and guest configuration
> >>> restrictions in order to support AER. For instance, all the functions
> >>> of the slot need to appear in a PCI-e topology in the guest with all
> >>> the functions in the right place such that a guest bus reset translates
> >>> into a host bus reset. The physical functions cannot be split between
> >>> guests even if IOMMU isolation would otherwise allow it. The user
> >>> needs to explicitly enable AER support for the devices. A VM need to
> >>> be specifically configured for AER support in order to set any sort of
> >>> expectations of a guest directed bus reset, let alone a guarantee that
> >>> it will happen. So all the existing VMs, where functions are split
> >>> between guests, or the topology isn't exactly right, or AER isn't
> >>> enabled see a regression from the above change as the device is no
> >>> longer reset.
> >>>
> >>
> >> I am not clear why set these restrictions in the current design. I take
> >> a glance at older versions of qemu's patchset, their thoughts is:
> >> translate a guest bus reset into a host bus reset(Which is
> >> unreasonable[*] to me). And I guess, that's the *cause* of these
> >> restrictions? Is there any other stories behind these restrictions?
> >>
> >> [*] In physical world, set bridge's secondary bus reset would send
> >> hot-reset TLP to all functions below, trigger every device's reset
> >> separately. Emulated device should behave the same, means just using
> >> each device's DeviceClass->reset method.
> >
> > Are you trying to say that an FLR is equivalent to a link reset?
>
> No. Look at old versions patchset, there is one names "vote the
> function 0 to do host bus reset when aer occurred"[1], that is what I
> called "translate guest link reset to host link reset", and what I think
> unreasonable(and I think it also does it wrongly). So in v10 version of
> mine, I dropped it.
>
> [1]https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg02987.html
>
> If "translate guest link reset to host link reset" is right, I can
> understand these restrictions[2][3].
>
> [2]. All physical functions in a single card must be assigned to the VM
> with AER enabled on each and configured on the same virtual bus.
> [3]. Don't place other devices under the virtual bus in [2], no matter
> physical, emulated, or paravirtual, even if other device
> supporting AER signaling
>
> Certain device's FLR calls its DeviceClass->reset method; link reset
> calls DeviceClass->reset of each device which on the bus. So, apparently
> they have difference. But if there is only 1 vfio-pci device under the
> virtual pci bus, I think FLR can be equivalent to a link reset, right?

No. An FLR resets the device while a secondary bus reset does a reset
of the link and the device. AER errors are sometimes issues with the
link, not the device. If we were to perform only an FLR, we're not
performing the same reset as would be done on bare metal.

> > Please go read the previous discussions, especially if you're sending
> > patches you don't believe in. Thanks,
> >
>
> I does not read ALL version's discussion thoroughly, but these
> restrictions exist for a long time, so I guess it is a result of
> previous discussions. If it is not, I am thinking of the possibility of
> dropping these restrictions[2][3], and drop the "aer" property,
> automatically enable this functionality or not according to device's
> capability.

If you're going to take the lead for these AER patches, I would
certainly suggest that understanding the reasoning behind the bus reset
behavior is a central aspect to this series. This effort has dragged
out for nearly two years and I apologize, but I don't really have a lot
of patience for rehashing some of these issues if you're not going to
read the previous discussions or consult with your colleagues to
understand how we got to this point. If you want to challenge some of
the design points, that's great, it could use some new eyes, but please
understand how we got here first. If we could replace a bus reset with
an FLR that would make things far more simple, but I don't see that as
a valid option. Thanks,

Alex