Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

From: Joerg Roedel
Date: Wed Nov 19 2008 - 04:26:00 EST


On Wed, Nov 19, 2008 at 03:05:24PM +0900, FUJITA Tomonori wrote:
> On Tue, 18 Nov 2008 16:43:22 +0100
> Joerg Roedel <joerg.roedel@xxxxxxx> wrote:
>
> > Joerg Roedel (4):
> > AMD IOMMU: add parameter to disable device isolation
> > AMD IOMMU: enable device isolation per default
> > AMD IOMMU: fix fullflush comparison length
> > AMD IOMMU: check for next_bit also in unmapped area
> >
> > Documentation/kernel-parameters.txt | 4 +++-
> > arch/x86/kernel/amd_iommu.c | 2 +-
> > arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> > 3 files changed, 8 insertions(+), 4 deletions(-)
> >
> > As the most important change these patches enable device isolation per
> > default. Tests have shown that there are drivers which have bugs and do
> > double-freeing of DMA memory.
>
> What drivers? We need to fix them if they are mainline drivers.

I found issues in network drivers only for now. The two drivers where I
found issues are the in-kernel ixgbe driver (I see IO_PAGE_FAULTS
there), the ixgbe version from the Intel website has a double-free bug
when unloading the driver or changing the device mtu. The same problem
was found with the Broadcom NetXtreme II driver.

> > This can lead to data corruption with a
> > hardware IOMMU when multiple devices share the same protection domain.
> > Therefore device isolation should be enabled by default.
>
> Hmm, the change is just because of the bug workaround? If so, I'm not
> sure it's a good idea. We need to fix the buggy drivers anyway. And
> device isolation is not free; e.g. use more memory rather than sharing
> a protection domain. I guess that more people prefer sharing a
> protection domain by default. It had been the default option for AMD
> IOMMU until you hit the bugs. IIRC, VT-d also shares a protection
> domain by default. It would be nice to avoid surprising users if the
> two virtualization IOMMUs works in the similar way.

We can't test all drivers for those bugs until 2.6.28 will be released.
And these bugs can corrupt data, for example when a driver frees dma
addresses allocated by another driver and these addresses are then
reallocated.
The only way to protect the drivers from each other is to isolate them
in different protection domains. The AMD IOMMU driver prints a WARN_ON()
if a driver frees dma addresses not yet mapped. This triggered with the
bnx2 and the ixgbe driver.
And the data corruption is real, it eat the root-fs of my testbox one
time.
I agree that we need to fix the drivers. I plan to implement some debug
code which allows driver developers to detect those bugs even if they
have no IOMMU in the system.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/