Re: [RFC PATCH v6 1/6] riscv: mm: dma-noncoherent: Switch using function pointers for cache management

From: Arnd Bergmann
Date: Fri Jan 20 2023 - 12:05:08 EST


On Fri, Jan 13, 2023, at 06:48, Christoph Hellwig wrote:
> On Tue, Jan 10, 2023 at 04:03:06PM +0100, Arnd Bergmann wrote:
>> I looked at all the implementations now and put them in a table[1]
>> to see what the differences are. The only bit that I think needs
>> discussion is the dma_sync_single_for_device(DMA_FROM_DEVICE) op
>> that I mentioned above. I see that arm64, csky, powerpc, riscv
>> and parisc all write out at least partical cache lines first to
>> avoid losing dirty data in the part that is not written by the
>> device[2][3], while the other ones don't[4].
>
> I'm tempted to declare [4] buggy until proof of the inverse.

Having looked at this some more, I see that the powerpc
version is a bit problematic here as well: this one
flushes the partial cache lines before and after the
DMA transfer, while only invalidating the full
cache lines. If a partical cache line gets written
to by the CPU while the buffer is owned by the device,
this means that the received data from the device is
immediately overwritten by the second flush.

The arm64 and riscv behavior of doing a flush before
and an invalidate after the DMA seems a bit more
appropriate, as that ends up keeping the DMA
data but discarding anything written by the CPU.

Obviously there is no winning either way if the same
cache line gets written by both CPU and device, I'm
just trying to figure out what behavior we actually
want here.

The best I can think of so far is:

- flush the partial cache lines before the DMA,
as powerpc does, and just invalidate the full
cache lines

- only invalidate but not clean/flush after the
DMA. This is the arm64 behavior

- warn when flushing partial cache lines
if dma_debug is enabled.

>> I also see that at least arc, arm, mips and riscv all want
>> CPU specific cache management operations to be registered
>> at boot time. While Russell had some concerns about your
>> suggestion to generalize the arm version, we could start
>> by moving the abstracted riscv version into
>> kernel/dma/direct.c and make sure it can be shared with
>> at least mips and arc, provided that we can agree on the
>> DMA_FROM_DEVICE behavior.
>
> Yes, I'd really like to start out with a common version and then
> move bits over. There's also some interesting bits about handling
> highmem for architectures that needs virtual addresss for flushing
> that might be worth sharing, too.
>
> Thіs should be a new file in kernel/dma/ as it's not only used by
> dma-direct but also by dma-iommu, and to keep the code nicely
> separated.
>
> Can you give it a go?

I started this at the beginning of the week but couldn't
finish it at all, but still plan to get back to it
next week.

Aside from the question for how to handle flush vs invalidate
on DMA_FROM_DEVICE, I'm still trying to figure out how to
best handle highmem with architecture specific cache management
operations. The easy approach would be to leave that up
to the architecture, passing only a physical address to
the flush function. A nicer interface might be to move the
loop over highmem pages out into common code, flush
lowmem pages by virtual addresss, and have a separate
callback for highmem pages that takes a page pointer,
like

struct dma_cache_ops {
void (*dma_cache_wback_inv)(void *start, unsigned long sz);
void (*dma_cache_inv)(void *start, unsigned long sz);
void (*dma_cache_wback)(void *start, unsigned long sz);
#ifdef CONFIG_HIGHMEM
void (*dma_cache_wback_inv_high_page)(struct page *, size_t start, unsigned long sz);
void (*dma_cache_inv_high_page)(struct page *, size_t start, unsigned long sz);
void (*dma_cache_wback_high_page)(struct page *, size_t start, unsigned long sz);
#endif
};

Let me know if you have a preference here, before I spend
too much time on something we don't want in the end.

Arnd