Re: [PATCH v2 13/29] nios2: DMA mapping API

From: Arnd Bergmann
Date: Thu Jul 24 2014 - 08:05:33 EST


On Thursday 24 July 2014 19:37:11 Ley Foon Tan wrote:
> On Tue, Jul 15, 2014 at 5:38 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > On Tuesday 15 July 2014 16:45:40 Ley Foon Tan wrote:
> >> +#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
> >> +#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
> >> +
> > ...
> >> +static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
> >> + enum dma_data_direction direction)
> >> +{
> >> + __dma_sync(vaddr, size, direction);
> >> +}
> >
> > IIRC dma_cache_sync should be empty if you define dma_alloc_noncoherent
> > to be the same as dma_alloc_coherent: It's already coherent, so no sync
> > should be needed. What does the CPU do if you try to invalidate the cache
> > on a coherent mapping?
> Okay, I got what you mean here. I will leave this dma_cache_sync()
> function empty.
> The CPU just do nothing if we try to invalidate cache on a coherent region.
> BTW, I found many other architectures still provide dma_cache_sync()
> even they define dma_alloc_noncoherent
> same as dma_alloc_coherent. Eg: blackfin, x86 or xtense.

They are probably all wrong ;-)

It's not a big issue though, since the x86 operation is cheap and the
other ones don't support any of the drivers that use dma_cache_sync.

> >> +void dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
> >> + size_t size, enum dma_data_direction direction)
> >> +{
> >> + BUG_ON(!valid_dma_direction(direction));
> >> +
> >> + __dma_sync(phys_to_virt(dma_handle), size, direction);
> >> +}
> >> +EXPORT_SYMBOL(dma_sync_single_for_cpu);
> >> +
> >> +void dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
> >> + size_t size, enum dma_data_direction direction)
> >> +{
> >> + BUG_ON(!valid_dma_direction(direction));
> >> +
> >> + __dma_sync(phys_to_virt(dma_handle), size, direction);
> >> +}
> >> +EXPORT_SYMBOL(dma_sync_single_for_device);
> >
> > More importantly: you do the same operation for both _for_cpu and _for_device.
> > I assume your CPU can never do speculative cache prefetches, so it's not
> > incorrect, but you do twice the number of invalidations and flushes that
> > you need.
> >
> > Why would you do anything for _for_cpu here?
> I am a bit confused for _for_cpu and _for_device here. I found some
> architectures like c6x and hexagon have same operation for both
> _for_cpu and _for_device as well.

(adding their maintainers to cc)

Yes, you are right, they seem to have the same bug and could see a noticeable
DMA performance improvement if they change it as well.

> I have spent some times look at other architectures and below is what
> I found. Please correct me if I am wrong, especially
> for_device():DMA_FROM_DEVICE.
>
> _for_cpu():
> case DMA_BIDIRECTIONAL:
> case DMA_FROM_DEVICE:
> /* invalidate cache */
> break;
> case DMA_TO_DEVICE:
> /* do nothing */
> break;

This seems fine: for a FROM_DEVICE mapping, we have flushed all
dirty entries during the _for_device or the map operation,
so if any clean entries are around, they need to be invalidated
in order to read the data from the device.

for TO_DEVICE, we don't care about the cache, because we are
going to overwrite the data, and we don't need to do anything.

> -------------------------
> _for_device():
> case DMA_BIDIRECTIONAL:
> case DMA_TO_DEVICE:
> /* flush and invalidate cache */
> break;
> case DMA_FROM_DEVICE:
> /* should we invalidate cache or do nothing? */
> break;

You actually don't need to invalidate the TO_DEVICE mappings
in both _for_device and _for_cpu. You have to flush them
in for_device, and you have to invalidate them at least once,
but don't need to invalidate them again in for_cpu if you have
done that already in for_device and your CPU does not do any
speculative prefetches that might populate the dcache.

In case of for_device FROM_DEVICE, you have to invalidate or
flush the caches to ensure that no dirty cache lines are
written to memory, but only if your CPU has a write-back
cache rather than write-through.

For bidirectional mappings, you may have to flush and invalidate.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/