Re: [PATCH] x86: avoid unnecessary low zone allocation in AMDIOMMU's alloc_coherent

From: FUJITA Tomonori
Date: Wed Sep 10 2008 - 09:04:16 EST

On Wed, 10 Sep 2008 14:48:22 +0200
Joerg Roedel <joerg.roedel@xxxxxxx> wrote:

> On Wed, Sep 10, 2008 at 09:38:11PM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 14:03:10 +0200
> > Joerg Roedel <joerg.roedel@xxxxxxx> wrote:
> > > It needs a fix anyway and the
> > > right solution here is to fall back to one of the software iommu
> > > implementations. The stackable dma_ops patches I have currently in work
> > > will do exactly that.
> >
> > I'm not sure you need the stackable dma_ops support. Calgary IOMMU had
> > the same problem and already solved it with dma_ops-per-device option.
> We need stackable dma_ops anyway for paravirt IOMMU support in KVM.

I know. We discussed it when adding dma_ops-per-device support.

> And they will fix this issue too.

Ok, I'll wait until I see how the patches solve the problem cleanly.

> > > These flags are already removed in the dma_alloc_coherent function which
> > > calls this one. Further I think in the case of a remapping IOMMU like
> >
> > Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does
> > that so that swiotlb and pci-nommu don't need the gfp hack. Clearing
> > the gfp flags is much simpler than setting up the flags correctly
> > mainly because of the fallback device, setting up the flags is really
> > difficult.
> Yes, dma_alloc_coherent in dma-mapping.h clears the flags. And this
> function also calls ops->alloc_coherent which points to the AMD IOMMUs
> alloc_coherent function if the driver is in place.

Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
in tip/x86/iommu:

dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp)
struct dma_mapping_ops *ops = get_dma_ops(dev);
void *memory;

gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);

Surely we here clear the flag but...

if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
return memory;

if (!dev) {
dev = &x86_dma_fallback_dev;
gfp |= GFP_DMA;

we play with it here though (not happens with pci devices),

if (!dev->dma_mask)
return NULL;

if (!ops->alloc_coherent)
return NULL;

Then dma_alloc_coherent_gfp_flags() sets it again according to
device->coherent_dma_mask and gfp before ops->alloc_coherent hook:

return ops->alloc_coherent(dev, size, dma_handle,
dma_alloc_coherent_gfp_flags(dev, gfp));

This code can set up the exact same gfp flag for swiotbl and nommu as
