Re: oops in pci_acs_path_enabled

From: David Ahern
Date: Fri Aug 03 2012 - 17:12:51 EST


On 8/3/12 2:21 PM, Alex Williamson wrote:
On Fri, 2012-08-03 at 11:39 -0600, David Ahern wrote:
Hi Alex:

Hitting an oops with 3.6-rc1. Backtrace from console attached. git blame
for the top function points to ad805758.

Hey David,

Hmm, what's special about your system? I've got an 82576 here and the
same path works fine. Any way you can get the top of the oops message?
Thanks,

Alex


Dell R410 I believe. pair of 5620 processors. 3 overlapping screen shots attached. objdump on pci.o suggests the pdev is NULL:

/opt/sw/ahern/kernels/kernel.git/drivers/pci/pci.c:2454

ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
if (ret >= 0)
return ret > 0;

if (!pci_is_pcie(pdev))
408a: 41 80 7c 24 4a 00 cmpb $0x0,0x4a(%r12)
4090: 74 e8 je 407a <pci_acs_enabled+0x2a>


Perhaps this bug explains the larger the issue which is that device passthrough in 3.6-rc1 (0d7614f) is broken for me -- config field for the PCI device does not exist. e.g.,

pcilib: Cannot open /sys/bus/pci/devices/0000:06:10.0/config
lspci: Unable to read the standard configuration space header of device 0000:06:10.0
pcilib: Cannot open /sys/bus/pci/devices/0000:06:10.0/config
lspci: Unable to read the standard configuration space header of device 0000:06:10.0
failed to find vendor-product id for PCI id "06:10.0"
Failed to claim PCI device 06:10.0

git bisect points to:

783f157bc5a7fa30ee17b4099b27146bd1b68af4 is the first bad commit
commit 783f157bc5a7fa30ee17b4099b27146bd1b68af4
Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
Date: Wed May 30 14:19:43 2012 -0600

intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups

Work around broken devices and adhere to ACS support when determining
IOMMU grouping.

Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
Signed-off-by: Joerg Roedel <joerg.roedel@xxxxxxx>

:040000 040000 83890398dabbf225fd0f5b3c8c3713a75b3fb5e1 b674ce2ecb315393a8c6c1ac98b3796d5ba09708 M drivers

I triggered the oops in a number of the bisect points as well -- in those cases the machine had to be power cycled.

David

Attachment: oops1.png
Description: PNG image

Attachment: oops2.png
Description: PNG image

Attachment: oops3.png
Description: PNG image