Re: oops in pci_acs_path_enabled

From: Alex Williamson
Date: Fri Aug 03 2012 - 17:52:08 EST


On Fri, 2012-08-03 at 15:12 -0600, David Ahern wrote:
> On 8/3/12 2:21 PM, Alex Williamson wrote:
> > On Fri, 2012-08-03 at 11:39 -0600, David Ahern wrote:
> >> Hi Alex:
> >>
> >> Hitting an oops with 3.6-rc1. Backtrace from console attached. git blame
> >> for the top function points to ad805758.
> >
> > Hey David,
> >
> > Hmm, what's special about your system? I've got an 82576 here and the
> > same path works fine. Any way you can get the top of the oops message?
> > Thanks,
> >
> > Alex
> >
>
> Dell R410 I believe. pair of 5620 processors. 3 overlapping screen shots
> attached. objdump on pci.o suggests the pdev is NULL:
>
> /opt/sw/ahern/kernels/kernel.git/drivers/pci/pci.c:2454
>
> ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
> if (ret >= 0)
> return ret > 0;
>
> if (!pci_is_pcie(pdev))
> 408a: 41 80 7c 24 4a 00 cmpb $0x0,0x4a(%r12)
> 4090: 74 e8 je 407a <pci_acs_enabled+0x2a>
>
>
> Perhaps this bug explains the larger the issue which is that device
> passthrough in 3.6-rc1 (0d7614f) is broken for me -- config field for
> the PCI device does not exist. e.g.,
>
> pcilib: Cannot open /sys/bus/pci/devices/0000:06:10.0/config
> lspci: Unable to read the standard configuration space header of device
> 0000:06:10.0
> pcilib: Cannot open /sys/bus/pci/devices/0000:06:10.0/config
> lspci: Unable to read the standard configuration space header of device
> 0000:06:10.0
> failed to find vendor-product id for PCI id "06:10.0"
> Failed to claim PCI device 06:10.0
>
> git bisect points to:
>
> 783f157bc5a7fa30ee17b4099b27146bd1b68af4 is the first bad commit
> commit 783f157bc5a7fa30ee17b4099b27146bd1b68af4
> Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Date: Wed May 30 14:19:43 2012 -0600
>
> intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups
>
> Work around broken devices and adhere to ACS support when determining
> IOMMU grouping.
>
> Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Signed-off-by: Joerg Roedel <joerg.roedel@xxxxxxx>
>
> :040000 040000 83890398dabbf225fd0f5b3c8c3713a75b3fb5e1
> b674ce2ecb315393a8c6c1ac98b3796d5ba09708 M drivers
>
> I triggered the oops in a number of the bisect points as well -- in
> those cases the machine had to be power cycled.

Is this the chunk that's causing the oops?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7469b53..27d8c97 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4133,6 +4133,7 @@ static int intel_iommu_add_device(struct device *dev)
PCI_DEVFN(PCI_SLOT(dma_pdev->devfn),
0)));

+#if 0
while (!pci_is_root_bus(dma_pdev->bus)) {
if (pci_acs_path_enabled(dma_pdev->bus->self,
NULL, REQ_ACS_FLAGS))
@@ -4140,6 +4141,7 @@ static int intel_iommu_add_device(struct device *dev)

swap_pci_ref(&dma_pdev, pci_dev_get(dma_pdev->bus->self));
}
+#endif

group = iommu_group_get(&dma_pdev->dev);
pci_dev_put(dma_pdev);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/