On Fri, Jul 21, 2023 at 4:01 PM Alexandre Ghiti <alex@xxxxxxxx> wrote:
I don't want that, and flush_tlb_kernel_range is flush_tlb_all. In
On 21/07/2023 18:08, Guo Ren wrote:
On Fri, Jul 21, 2023 at 11:19 PM Alexandre Ghiti <alex@xxxxxxxx> wrote:
On 21/07/2023 16:51, guoren@xxxxxxxxxx wrote:We met a spurious page fault panic on an OoO machine.
From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>Can you share what you are trying to fix here?
RISC-V specification permits the caching of PTEs whose V (Valid)
bit is clear. Operating systems must be written to cope with this
possibility, but implementers are reminded that eagerly caching
invalid PTEs will reduce performance by causing additional page
faults.
So we must keep vmalloc_fault for the spurious page faults of kernel
virtual address from an OoO machine.
Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
Signed-off-by: Guo Ren <guoren@xxxxxxxxxx>
---
arch/riscv/mm/fault.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 85165fe438d8..f662c9eae7d4 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -258,8 +258,7 @@ void handle_page_fault(struct pt_regs *regs)
* only copy the information from the master page table,
* nothing more.
*/
- if ((!IS_ENABLED(CONFIG_MMU) || !IS_ENABLED(CONFIG_64BIT)) &&
- unlikely(addr >= VMALLOC_START && addr < VMALLOC_END)) {
+ if (unlikely(addr >= TASK_SIZE)) {
vmalloc_fault(regs, code, addr);
return;
}
1. The processor speculative execution brings the V=0 entries into the
TLB in the kernel virtual address.
2. Linux kernel installs the kernel virtual address with the page, and V=1
3. When kernel code access the kernel virtual address, it would raise
a page fault as the V=0 entry in the tlb.
4. No vmalloc_fault, then panic.
I have a fix (that's currently running our CI) for commit 7d3332be011eCould you share that patch?
("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area") that
implements flush_cache_vmap() since we lost the vmalloc_fault.
Here we go:
Author: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx>
Date: Fri Jul 21 08:43:44 2023 +0000
riscv: Implement flush_cache_vmap()
The RISC-V kernel needs a sfence.vma after a page table
modification: we
used to rely on the vmalloc fault handling to emit an sfence.vma, but
commit 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for
vmalloc/modules area") got rid of this path for 64-bit kernels, so
now we
need to explicitly emit a sfence.vma in flush_cache_vmap().
Note that we don't need to implement flush_cache_vunmap() as the
generic
code should emit a flush tlb after unmapping a vmalloc region.
Fixes: 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for
vmalloc/modules area")
Signed-off-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx>
diff --git a/arch/riscv/include/asm/cacheflush.h
b/arch/riscv/include/asm/cacheflush.h
index 8091b8bf4883..b93ffddf8a61 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -37,6 +37,10 @@ static inline void flush_dcache_page(struct page *page)
#define flush_icache_user_page(vma, pg, addr, len) \
flush_icache_mm(vma->vm_mm, 0)
+#ifdef CONFIG_64BIT
+#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end)
+#endif
addition, it would call IPI, which is a performance killer.
What's the problem of spurious fault replay? It only costs a
local_tlb_flush with vaddr.
+--
#ifndef CONFIG_SMP
#define flush_icache_all() local_flush_icache_all()
Let me know if that works for you!
Best Regards
Guo Ren