Re: git-latest: kernel oops in IOMMU setup

From: Grant Grundler
Date: Thu Jan 08 2009 - 16:41:39 EST


On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote:
>
> latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
> disable VT-d, this of course goes away).
>
> The oops happens very early during boot in device_to_iommu (called
> from domain_context_mapping_one).
>
> Looking at the code dump and the disassembled function here's where
> the error happens:
>
> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
> {
> struct dmar_drhd_unit *drhd = NULL;
> int i;
>
> for_each_drhd_unit(drhd) {
> if (drhd->ignored)
> continue;
>
> for (i = 0; i < drhd->devices_cnt; i++)
> if (drhd->devices[i]->bus->number == bus &&
> --> drhd->devices[0] is NULL
> drhd->devices[i]->devfn == devfn)
> return drhd->iommu;
>
>
> Given how early this happens it's a little hard to provide logs, etc. I
> literally used delay_boot=100 and wrote things down by hand (forgot my
> digital camera) and then added printk's to verify).
>
> please let me know what other data I should collect.

If you can, a back trace. Basically just need to know which caller
is tripping over this. But there can't be that many callers and they
are all in this file:
0 intel-iommu.c device_to_iommu 431 static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
1 intel-iommu.c domain_context_mapping_on 1471 iommu = device_to_iommu(bus, devfn);
2 intel-iommu.c domain_context_mapped 1593 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);
3 intel-iommu.c domain_remove_dev_info 1684 iommu = device_to_iommu(info->bus, info->devfn);
4 intel-iommu.c vm_domain_remove_one_dev_ 2773 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);
5 intel-iommu.c vm_domain_remove_one_dev_ 2803 if (device_to_iommu(info->bus, info->devfn) == iommu)
6 intel-iommu.c vm_domain_remove_all_dev_ 2836 iommu = device_to_iommu(info->bus, info->devfn);
7 intel-iommu.c intel_iommu_attach_device 3023 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);

so it should be possible to figure out which one is called
before the dev is setup. It's unlikely to be anything with
"remove" in the name. :)

My guess is it's intel_iommu_attach_device being called "too early".

hth,
grant


hth,
grant

>
> The system ran fine with the 2.6.28 release kernel.
>
> /D
>
> --
> Dirk Hohndel
> Intel Open Source Technology Center
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/