Re: [PATCH] riscv: Support non-coherency memory model

From: Gary Guo
Date: Wed Apr 24 2019 - 08:46:03 EST




On 24/04/2019 06:57, Guo Ren wrote:
> Hi Gary,
>
> On Wed, Apr 24, 2019 at 03:21:14AM +0000, Gary Guo wrote:
>>> Look:
>>> linux-next git:(riscv_asid_allocator_v2)$ grep GLOBAL arch/riscv -r
>>> arch/riscv/include/asm/pgtable-bits.h:#define _PAGE_GLOBAL (1 << 5) /*
>>> Global */
>>> arch/riscv/include/asm/pgtable-bits.h: _PAGE_USER |
>>> _PAGE_GLOBAL))
>>>
>>> Your patch tell us _PAGE_USER and _PAGE_GLOBAL are duplicate and why we
>>> couldn't make _PAGE_USER implies _PAGE_GLOBAL? Can you give an example
>>> of a real scene in PTE about:
>>> _PAGE_USER:0 + _PAGE_GLOBAL:1
>>> or
>>> _PAGE_USER:1 + _PAGE_GLOBAL:0
>>>
>>> Of cause I know USER & GLOBAL are conceptually very different, but
>>> there are only 10 attribute-bits for riscv (In fact we've wasted two bits
>>> to support huge RV32-pfn :P). So I think it is time to merge these two bits
>>> before hardware supports GLOBAL. Reserve them for future!
>>
>> Two cases I can think of:
>> * vdso like things. They're user pages that can really be shared across address spaces (i.e. global). Kernels like L4 implement most systems calls similar to VDSO, so USER + GLOBAL is useful.
> Vdso is a user space mapping in linux, See: fs/binfmt_elf.c
>
> static int load_elf_binary(struct linux_binprm *bprm) {
> ...
> #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
> retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
> if (retval < 0)
> goto out;
> #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
>
> All linux archs use arch_setup_additional_pages for vdso mapping and
> every process has its own vdso mapping to the same pages.

But we shouldn't prevent a kernel from mapping a USER page globally. As
I said, the fact that Linux doesn't do it isn't a valid reason for
omitting the possibility.

>
> I don't think vdso is a real scene for GLOBAL in PTE.
>
>> * hypervisor without H-extension: This requires shadow page tables. Supervisor
>> pages are mapped to supervisor shadow pages. However these shadow pages cannot
>> be GLOBAL because they can't be shared between VMs. So !USER + !GLOBAL is useful.
> Hypervisor use 2-stages TLB translation in hardware and shadow page
> tables is for stage 2 translation. Shadow page tables care vmid not
> asid.

When H-extension is present, stage 2 translation uses VMID and is
performed by hardware. When H-extension is not present, there's no such
thing called VMID. When H-extension is not present, both hypervisor and
guest supervisor will run in supervisor mode, and hypervisor uses
MSTATUS.TVM to trap guest supervisor virtual memory operations. The
shadow page table is populated by doing 2-stage page walk in software.
In this case, the hypervisor likely needs to use some bits of ASID to
emulate the VMID feature. In this case GLOBAL page cannot be used as it
means that the page exists in all physical ASIDs (which contains both
emulated VMID and ASID). Having supervisor pages being GLOBAL makes the
semantics incorrect!

> If hardware don't support H-extension (MMU 2-stages translation), it's
> hard to accept for virtualization performance.

The RISC-V privileged spec is explicitly designed to allow the
techniques described above (this is the sole purpose of MSTATUS.TVM). It
might be as high performance as a hardware with H-extension, but is
definitely a legit use case. In fact, it is vital for use cases like
recursive virtualization.

Also, I believe the PTE format of RISC-V is already frozen -- therefore
it is impossible now to merge GLOBAL and USER bit, nor to replace RSW
bit with another bit.

>
> I don't think hypervisor is a real scene for GLOBAL in PTE.
>
> Are there other scene for GLOBAL in PTE?
>
> Best Regards
> Guo Ren
>