Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

From: David Woodhouse
Date: Mon Sep 09 2013 - 05:08:00 EST


On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote:
>
> This causes problem on kdump. Devices are working in first kernel, and
> after switching to second kernel and initializing IOMMU, many DMAR faults
> occur and it causes problems like driver error or PCI SERR, at last
> kdump fails. This patch fixes this problem.

I'm not sure I'd call this a fix.

If the driver is so broken that it cannot get the device working again
after a fault, surely the driver needs to be fixed?

If the system is suffering an IRQ storm because device doesn't give up
after the first few faults, then we should switch off the fault
*reporting* for that device so that its faults get ignored (until it
next actually sets up a DMA mapping, or something).

For the IOMMU code to reset individual devices, just because they still
have an active DMA mapping even if they're not *doing* DMA, seems wrong.
You'll even end up resetting devices just because they have an RMRR,
won't you? (Although I wouldn't lose any sleep over that, I suppose. In
fact it might be a *feature*... :)

--
David Woodhouse Open Source Technology Centre
David.Woodhouse@xxxxxxxxx Intel Corporation

Attachment: smime.p7s
Description: S/MIME cryptographic signature