Re: [PATCH] riscv: Support non-coherency memory model

From: Guo Ren
Date: Fri Apr 26 2019 - 12:06:13 EST


Hi Arnd,

On Thu, Apr 25, 2019 at 11:50:11AM +0200, Arnd Bergmann wrote:
> On Wed, Apr 24, 2019 at 4:23 PM Christoph Hellwig <hch@xxxxxx> wrote:
> >
> > On Wed, Apr 24, 2019 at 12:45:56PM +0000, Gary Guo wrote:
> > > The RISC-V privileged spec is explicitly designed to allow the
> > > techniques described above (this is the sole purpose of MSTATUS.TVM). It
> > > might be as high performance as a hardware with H-extension, but is
> > > definitely a legit use case. In fact, it is vital for use cases like
> > > recursive virtualization.
> > >
> > > Also, I believe the PTE format of RISC-V is already frozen -- therefore
> > > it is impossible now to merge GLOBAL and USER bit, nor to replace RSW
> > > bit with another bit.
> >
> > Yes, I do not think we can just repurpose a bit. Even using a currently
> > unused one would require some gymnastics.
> >
> > That being said IFF we want to support non-coherent DMA (and I think we
> > do as people glue together their SOCs using shoestring and paper clips,
> > as already demonstrated by Andes and C-SKY in RISC-V space, and most
> > arm, mips and ppc SOCs) we need something like this flag. The current
> > RISC-V method that only allows M-mode to set up such attributes on
> > a small number or PMP regions just doesn't work well with the way how
> > Linux and most non-trivial OSes implement DMA memory allocations.
> >
> > Note that I said well - in theory we can have a firmware provided
> > uncached pool - that is what Linux does on most nommu (that is without
> > pagetables) ports, but the fixed sized pool really does suck and will
> > make users very unhappy.
>
> You could probably get away with allowing uncached mappings only
> for huge pages, and using one or two of the bits the PMD for it.
> This should cover most use cases, since in practice coherent allocations
> tend to be either small and rare (device descriptors) or very big
> (frame buffer etc), and both cases can be handled with hugepages
> and gen_pool_alloc, possibly CMA added in since there will likely
> not be an IOMMU either on the systems that lack cache coherent DMA.
Generally attributs in huge-tlb-entry and leaf-tlb-entry should be the
same. Only put _PAGE_CACHE and _PAGE_BUF bits in huge-tlb-entry sounds
a bit strange.

The gen_pool_alloc only 256KB by default, but a huge tlb entry is 4MB.
Hardware couldn't setup vitual-4MB to a phys-256KB range mapping in TLB.

>
> One downside is that you need a little more care for drivers that
> use dma_mmap_coherent() to expose coherent buffers to user space.
>
> Two other points about the proposal:
> - Aside from completely uncached/unbuffered mappings, you typically
> want uncached/buffered mappings to cover dma_alloc_wc() that is
> typically used for frame buffers etc that need write-combining to get
> acceptable performance
I agree dma_alloc_wc is necessary, and we need add another more attribute
bit in PTE: _PAGE_BUF.
Perhaps using _PAGE_BUF + _PAGE_CACHE are better then _PAGE_CONHENCY.

> - you need to decide what is supposed to happen when there are
> multiple conflicting mappings for the same physical address.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What's the mulitple confilcing mappings ?

Best Regards
Guo Ren