Re: [PATCH] intel-iommu: Synchronize gcmd value with global commandregister

From: Takao Indoh
Date: Mon Apr 08 2013 - 04:57:39 EST


(2013/04/04 23:24), David Woodhouse wrote:
> On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote:
>>
>> - DMAR fault messages floods and second kernel does not boot. Recently I
>> saw similar report. https://lkml.org/lkml/2013/3/8/120
>
> Right. So the fix for that is to make the subsequent errors silent,
> until/unless we actually get a request to create a mapping for the given
> device.
>
>> - igb driver detectes error on linkup and kdump via network fails.
>
> That's a driver bug, IIRC. It was failing to completely reset the
> hardware. It's fixed now, isn't it?

No, it can be reproduced with latest kernel(3.9.0-rc6).

>
>> - On a certain platform, though kdump itself works, PCIe error like
>> Unexpected Completion is detected and it gets hardware degraded.
>
> More information required.

When I tested intel_iommu on a certain machine, the following error
message was logged in its firmware, and I/O board got abnormal status.
05:00.0 is igb, so I think this was caused by DMA error on igb. This
occurs before igb driver loading, so this cannot be fixed in driver.

PCI: Unexpected Completion Bus: 5 Device: 0x00 Function: 0x00

Anyway, I'm thinking we should introduce something framework to clean
all devices to stop DMA at boot time rather than dealing with the
problem in each driver. And one of the way I found is resetting devcies
by PCIe layer. If DMAR is disabled in init_dmars(), we can have a
chance to handle devices to stop DMA in PCI layer, like qci-quirk. This
is one of the reason why I propose this patch.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/