Re: [PATCH] amd iommu: force flush of iommu prior during shutdown

From: Neil Horman
Date: Thu Apr 01 2010 - 09:31:05 EST


On Thu, Apr 01, 2010 at 12:10:40AM -0700, Chris Wright wrote:
> * Vivek Goyal (vgoyal@xxxxxxxxxx) wrote:
> > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > > * Neil Horman (nhorman@xxxxxxxxxxxxx) wrote:
> > > > Flush iommu during shutdown
> > > >
> > > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > > kernel crash, that dma operations might still be in flight from the previous
> > > > kernel during the kdump kernel boot. This can lead to memory corruption,
> > > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > > a kdump boot as endless iommu error log entries of the form:
> > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > > address=0x000000000245a0c0 flags=0x0070]
> > >
> > > We've already fixed this problem once before, so some code shift must
> > > have brought it back. Personally, I prefer to do this on the bringup
> > > path than the teardown path. Besides keeping the teardown path as
> > > simple as possible (goal is to get to kdump kernel asap), there's also
> > > reason to competely flush on startup in genernal in case BIOS has done
> > > anything unsavory.
> >
> > Can we flush domains (all the I/O TLBs assciated with each domain), during
> > initialization? I think all the domain data built by previous kernel will
> > be lost and new kernel will have no idea about.
>
> We first invalidate the device table entry, so new translation requests
> will see the new domainid for a given BDF. Then we invalidate the
> whole set of page tables associated w/ the new domainid. Now all dma
> transactions will need page table walk (page tables will be empty excpet
> for any 1:1 mappings). Any old domainid's from previous kernel that
> aren't found in new device table entries are effectively moot. Just so
> happens that in kexec/kdump case, they'll be the same domainid's, but
> that doesn't matter.
>
> thanks,
> -chris
>
Additionally chris (this is just for my own education here), what happens when
we disable the iommu while dma's are in flight? I ask because from what I read,
my assumption is that the iommu effectively enters a passive mode where bus
accesses from devices holding dma addresses that were previously provided by an
iommu translation will just get strobed onto the bus without being translated
back to physical addresses. Won't that result in bus errors causing master
aborts? If so, it would seem that it would be further cause to leave the iommu
on during a crash/kdump boot.

Neil

> _______________________________________________
> kexec mailing list
> kexec@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/