Re: [PATCH mlx5-next 1/7] PCI/IOV: Provide internal VF index

From: Bjorn Helgaas
Date: Sat Sep 25 2021 - 13:41:29 EST


On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > >
> > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > >
> > > > > This index is needed for device drivers that implement live migration
> > > > > for their internal operations that configure/control their VFs.
> > > > >
> > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > from this series needs it and not the bus/device/function which is
> > > > > exposed today.
> > > > >
> > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > was used to create the bus/device/function.
> > > > >
> > > > > Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx>
> > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx>
> > > >
> > > > Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > >
> > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > bus number, which seems like a bug).
> ...

> > And it still looks like the existing code is buggy. This is called
> > via sysfs, so if the PF is on bus X and the user writes to
> > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > VF.
>
> In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> to PF which has "struct mlx5_core_dev". My expectation is that they share
> same bus as that PF was the one who created VFs. The mlx5 devices supports
> upto 256 VFs and it is far below the bus split mentioned in PCI spec.
>
> How can VF and their respective PF have different bus numbers?

See PCIe r5.0, sec 9.2.1.2. For example,

PF 0 on bus 20
First VF Offset 1
VF Stride 1
NumVFs 511
VF 0,1 through VF 0,255 on bus 20
VF 0,256 through VF 0,511 on bus 21

This is implemented in pci_iov_add_virtfn(), which computes the bus
number and devfn from the VF ID.

pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
the user writes to sriov_vf_msix_count for VF 0,256, it looks like
we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.

The spec encourages devices that require no more than 256 devices to
locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
only have 255 VFs, you may avoid the problem.

But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
is safe to assume the bus number is the same.

Bjorn