Re: [PATCH] vfio/pci: Support error recovery

From: Cao jin
Date: Tue Dec 06 2016 - 05:42:20 EST




On 12/06/2016 12:59 PM, Alex Williamson wrote:
> On Tue, 6 Dec 2016 05:55:28 +0200
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>
>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
>>> If you're going to take the lead for these AER patches, I would
>>> certainly suggest that understanding the reasoning behind the bus reset
>>> behavior is a central aspect to this series. This effort has dragged
>>> out for nearly two years and I apologize, but I don't really have a lot
>>> of patience for rehashing some of these issues if you're not going to
>>> read the previous discussions or consult with your colleagues to
>>> understand how we got to this point. If you want to challenge some of
>>> the design points, that's great, it could use some new eyes, but please
>>> understand how we got here first.
>>
>> Well I'm guessing Cao jin here isn't the only one not
>> willing to plough through all historical versions of the patchset
>> just to figure out the motivation for some code.
>>
>> Including a summary of a high level architecture couldn't hurt.
>>
>> Any chance of writing such? Alternatively, we can try to build it as
>> part of this thread. Shouldn't be hard as it seems somewhat
>> straight-forward on the surface:
>>
>> - detect link error on the host, don't reset link as we would normally do
>
> This is actually a new approach that I'm not sure I agree with. By
> skipping the host directed link reset, vfio is taking responsibility
> for doing this, but then we just assume the user will do it. I have
> issues with this.
>
> The previous approach was to use the error detected notifier to block
> access to the device, allowing the host to perform the link reset. A
> subsequent notification in the AER process released the user access
> which allowed the user AER process to proceed. This did result in both
> a host directed and a guest directed link reset, but other than
> coordinating the blocking of the user process during host reset, that
> hasn't been brought up as an issue previously.
>

Tests on previous versions didn't bring up issues as I find, I think
that is because we didn't test it completely. As I know, before August
of this year, we didn't have cable connected to NIC, let alone
connecting NIC to gateway.

Even if I fixed the guest oops issue in igb driver that Alex found in
v9, v9 still cannot work in my test. And in my test, disable link
reset(in host) in aer core for vfio-pci is the most significant step to
get my test passed.

>> - report link error to guest
>> - detect link reset request from guest
>> - reset link on host
>>
>> Since link reset will reset all devices behind it, for this to work we
>> need same set of devices behind the link in host and guest. Enforcing
>> this would be nice to have.
>
> This is a pretty significant complication and I think it's a
> requirement. This is the heart of why we have an AER vfio-pci device
> option and why we require that QEMU should fail to initialize the
> device if AER is enabled in an incompatible configuration. If a user
> cannot be sure that AER is enabled on a device, it's pointless to
> bother implementing it, though honestly I question the value of it in
> the VM altogether given configuration requirements (ie. are users
> going to accept the reason that all the ports of a device need to be
> assigned to a single guest for guest-based AER recovery when they were
> previously able to assign each port to a separate VM?).
>
>> - as link now might end up in bad state, reset
>> it when device is unassigned
>
> This is also a new aspect for the approach here, previously we allowed
> the host directed link reset so we could assume the device was
> sufficiently recovered. In the proposal here, the AER core skips any
> devices bound to vfio-pci, but vfio can't guarantee that we can do a
> bus reset on them. PCI bus isolation is not accounted for in DMA
> isolation, which is the basis for IOMMU groups. A bus can host
> multiple IOMMU groups, each of which may have a different user. Only
> in a very specific configuration can vfio do a bus reset.
>
>> Any details I missed?
>
> AIUI, the critical feature is that the guest needs to be able to reset
> the device link, all the other design elements play out from the
> logical expansion of that feature. It means that a guest bus reset
> needs to translate to a host bus reset, which means that all of the
> affected host devices need to be accounted for and those that are
> assigned to the guest need to be affected in the guest in the same
> way. QEMU must enforce this configuration or else a user cannot know
> the result of a AER fault, ie. will it cause a VMSTOP condition or is
> the fault forwarded to the guest. The required configuration
> restrictions are quite involved, therefore we can't simply require this
> of all configurations, so a vfio-pci device option is added. The
> coordination of the host directed reset with the guest directed reset
> was only a complications discovered within the last few revisions of
> the series. As noted above, the previous solution to this was to
> attempt to block access to the device while the host reset proceeds.
>
> Clearly it's a little disconcerting if we throw all of that away and
> simply assume that an FLR is sufficient to reset the device when it
> seems like link issues might be a nontrivial source of AER faults. If
> FLR is sufficient, then why does the core AER handling code in the
> kernel not simply do this? Thanks,
>

I agree with the points here. Now I understand why translate a guest
link reset to host link reset[*], and FLR shouldn't be equivalent to
link reset, but the PITY is, we can't trigger a real fatal error to see
if a FLR is sufficient to reset the device.

[*]https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04100.html
--
Sincerely,
Cao jin