RE: [PATCH v3] PCI: Introduce flag for detached virtual functions

From: Tian, Kevin
Date: Tue Sep 01 2020 - 05:48:35 EST


> From: kvm-owner@xxxxxxxxxxxxxxx <kvm-owner@xxxxxxxxxxxxxxx> On Behalf
> Of Niklas Schnelle
> Sent: Friday, August 28, 2020 5:10 PM
> To: Bjorn Helgaas <helgaas@xxxxxxxxxx>; Alex Williamson
> <alex.williamson@xxxxxxxxxx>
>
[...]
> >>
> >> FWIW, pci_physfn() never returns NULL, it returns the provided pdev if
> >> is_virtfn is not set. This proposal wouldn't change that return value.
> >> AIUI pci_physfn(), the caller needs to test that the returned device is
> >> different from the provided device if there's really code that wants to
> >> traverse to the PF.
> >
> > Oh, so this VF has is_virtfn==0. That seems weird. There are lots of
> > other ways that a VF is different: Vendor/Device IDs are 0xffff, BARs
> > are zeroes, etc.
> >
> > It sounds like you're sweeping those under the rug by avoiding the
> > normal enumeration path (e.g., you don't have to size the BARs), but
> > if it actually is a VF, it seems like there might be fewer surprises
> > if we treat it as one.
> >
> > Why don't you just set is_virtfn=1 since it *is* a VF, and then deal
> > with the special cases where you want to touch the PF?
> >
> > Bjorn
> >
>
> As we are always running under at least a machine level hypervisor
> we're somewhat in the same situation as e.g. a KVM guest in
> that the VFs we see have some emulation that makes them act more like
> normal PCI functions. It just so happens that the machine level hypervisor
> does not emulate the PCI_COMMAND_MEMORY, it does emulate BARs and
> Vendor/Device IDs
> though.
> So is_virtfn is 0 for some VF for the same reason it is 0 on
> KVM/ESXi/HyperV/Jailhouse…
> guests on other architectures.

I wonder whether it's a good idea to also find a way to set is_virtfn
for normal KVM guest which get a vf assigned. There are other cases
where faithful emulation of certain PCI capabilities is difficult, e.g.
when enabling guest SVA related features (PASID/ATS/PRS). Per PCIe
spec, some or all fields of those capabilities are shared between PF
and VF. Among them:

1) Some could be emulated properly and indirectly reflected in hardware,
e.g. Intel VT-d allows additional control per VF about whether to accept
page request, execute/privileged permission, etc. thus allowing VF-specific
control even when device-side setting is shared;

2) Some could be purely emulated in software and it's harmless to leave
the hardware following PF setting, e.g. ATS enable, STU(?), outstanding
page request allocation, etc.;

3) However, I didn’t see a clean way of emulating page_request_ctrl.reset
and page_request_status.stopped. Those two have clear definition about
outstanding page requests. They are shared thus we cannot issue physical
action just due to request on one VF, while pure software emulation
cannot guarantee the desired expectation. Of course this issue also exists
even on bare metal - pci_enable/disable/reset_pri just do nothing for
vf. But there is chance to mitigate (e.g. timeout), but not possible in guest
if the guest doesn't know it's actually a VF.

Setting is_virtfn=1 allows guest to be cooperative like running together
with PF driver. But there is an ordering issue. The guest knows whether
a device is VF only when the VF driver is loaded (based on PCI_ID), but
related capabilities might be already enabled when attaching the device
to IOMMU (at least for intel_iommu). But suppose it's not a hard fix.
Last, detached vf is not a PCISIG definition. So the host still needs to
do proper emulation (even not faithful) of those capabilities for guests
who don't recognize detached vf.

Thoughts?

Thanks
Kevin