Seeing DMAR errors after multiple load/unload with SR-IOV

From: padmanabh ratnakar
Date: Mon Jun 06 2011 - 05:09:37 EST


Hi,
I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
I have used following boot options -
intel_iommu=on iommu=pt

I was loading/unloading my NIC driver(be2net) with num_vfs=7.

After some iterations I get following DMAR errors -
Jun 4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
2d on CPU 0.
Jun 4 03:50:20 rhel6 kernel: Do you have a strange power saving mode enabled?
Jun 4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
Jun 4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
Jun 4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
fault addr 78077000
Jun 4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
context entry is clear

I was trying to debug this. I dont understand iommu code much.
The physical address belongs the printed PCI function and there should
not have been an error.

I am unable to see pci_dev(pdev) of VFs getting removed from
si_domain->devices list(intel-iommu.c)
when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
Looks like issue happens when when freed pdev is allocated again and
as it is already in list,
required initializations dont happen.

I dont know if my understanding is correct. Can anyone point me to
what the issue may be?

Thanks,
Padmanabh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/