Re: [PATCH] vfio/pci: Support error recovery

From: Alex Williamson
Date: Tue Dec 06 2016 - 10:25:57 EST


On Tue, 6 Dec 2016 14:11:03 +0800
Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:

> On 12/06/2016 12:17 AM, Alex Williamson wrote:
> > On Mon, 5 Dec 2016 13:52:03 +0800
> > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
> >
> >> On 12/04/2016 11:30 PM, Alex Williamson wrote:
> >>> On Sun, 4 Dec 2016 20:16:42 +0800
> >>> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
> >>>
> >>>> On 12/01/2016 10:55 PM, Alex Williamson wrote:
> >>>>> On Thu, 1 Dec 2016 21:40:00 +0800
> >>>>
> >>>>>>> If an AER fault occurs and the user doesn't do a reset, what
> >>>>>>> happens when that device is released and a host driver tries to make
> >>>>>>> use of it? The user makes no commitment to do a reset and there are
> >>>>>>> only limited configurations where we even allow the user to perform a
> >>>>>>> reset.
> >>>>>>>
> >>>>>>
> >>>>>> Limited? Do you mean the things __pci_dev_reset() can do?
> >>>>>
> >>>>> I mean that there are significant device and guest configuration
> >>>>> restrictions in order to support AER. For instance, all the functions
> >>>>> of the slot need to appear in a PCI-e topology in the guest with all
> >>>>> the functions in the right place such that a guest bus reset translates
> >>>>> into a host bus reset. The physical functions cannot be split between
> >>>>> guests even if IOMMU isolation would otherwise allow it. The user
> >>>>> needs to explicitly enable AER support for the devices. A VM need to
> >>>>> be specifically configured for AER support in order to set any sort of
> >>>>> expectations of a guest directed bus reset, let alone a guarantee that
> >>>>> it will happen. So all the existing VMs, where functions are split
> >>>>> between guests, or the topology isn't exactly right, or AER isn't
> >>>>> enabled see a regression from the above change as the device is no
> >>>>> longer reset.
> >>>>>
> >>>>
> >>>> I am not clear why set these restrictions in the current design. I take
> >>>> a glance at older versions of qemu's patchset, their thoughts is:
> >>>> translate a guest bus reset into a host bus reset(Which is
> >>>> unreasonable[*] to me). And I guess, that's the *cause* of these
> >>>> restrictions? Is there any other stories behind these restrictions?
> >>>>
> >>>> [*] In physical world, set bridge's secondary bus reset would send
> >>>> hot-reset TLP to all functions below, trigger every device's reset
> >>>> separately. Emulated device should behave the same, means just using
> >>>> each device's DeviceClass->reset method.
> >>>
> >>> Are you trying to say that an FLR is equivalent to a link reset?
> >>
> >> No. Look at old versions patchset, there is one names "vote the
> >> function 0 to do host bus reset when aer occurred"[1], that is what I
> >> called "translate guest link reset to host link reset", and what I think
> >> unreasonable(and I think it also does it wrongly). So in v10 version of
> >> mine, I dropped it.
> >>
> >> [1]https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg02987.html
> >>
> >> If "translate guest link reset to host link reset" is right, I can
> >> understand these restrictions[2][3].
> >>
> >> [2]. All physical functions in a single card must be assigned to the VM
> >> with AER enabled on each and configured on the same virtual bus.
> >> [3]. Don't place other devices under the virtual bus in [2], no matter
> >> physical, emulated, or paravirtual, even if other device
> >> supporting AER signaling
> >>
> >> Certain device's FLR calls its DeviceClass->reset method; link reset
> >> calls DeviceClass->reset of each device which on the bus. So, apparently
> >> they have difference. But if there is only 1 vfio-pci device under the
> >> virtual pci bus, I think FLR can be equivalent to a link reset, right?
> >
> > No. An FLR resets the device while a secondary bus reset does a reset
> > of the link and the device. AER errors are sometimes issues with the
> > link, not the device. If we were to perform only an FLR, we're not
> > performing the same reset as would be done on bare metal.
> >
>
> Thanks for you explanation, it does helps, except the last sentence, I
> think I understand it now: fatal error implies there may be link issue
> exists(pci express spec: 6.2.2.2.1), so, should do link reset for fatal
> error(that is what and why aer core does). And so, in patch[1] above,
> qemu does a link reset when seeing secondary bus reset bit of virtual
> bus got set. is it right?

QEMU is only going to do a bus reset if the guest is participating in
AER recovery AND QEMU supports AER recovery AND the guest topology
configuration allows the guest bus reset to induce a host bus reset.
vfio does not know that QEMU is the user and cannot assume the user
will perform a bus reset. We need to give the user the ability to
recover from an AER, but not rely on the user to do so. We cannot
assume the user's intention or capabilities. Thanks,

Alex