Re: Seeing DMAR errors after multiple load/unload with SR-IOV

From: Alex Williamson
Date: Mon Jun 06 2011 - 18:18:25 EST


On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote:
> Hi,
> I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
> I have used following boot options -
> intel_iommu=on iommu=pt
>
> I was loading/unloading my NIC driver(be2net) with num_vfs=7.
>
> After some iterations I get following DMAR errors -
> Jun 4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
> 2d on CPU 0.
> Jun 4 03:50:20 rhel6 kernel: Do you have a strange power saving mode enabled?
> Jun 4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
> Jun 4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
> Jun 4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
> fault addr 78077000
> Jun 4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
> context entry is clear
>
> I was trying to debug this. I dont understand iommu code much.
> The physical address belongs the printed PCI function and there should
> not have been an error.
>
> I am unable to see pci_dev(pdev) of VFs getting removed from
> si_domain->devices list(intel-iommu.c)
> when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
> Looks like issue happens when when freed pdev is allocated again and
> as it is already in list,
> required initializations dont happen.
>
> I dont know if my understanding is correct. Can anyone point me to
> what the issue may be?

Typically devices are removed from the domain via
drivers/pci/intel-iommu.c:device_notifier(), which is called as the
device is unbound from the driver. However, this seems to get skipped
when running in passthrough mode, so I'm not sure where that's supposed
to occur. Does it happen w/o passthrough? Also note that some
intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update
and see if anything is better there. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/