Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

From: Jan Kiszka
Date: Thu Feb 20 2020 - 01:38:52 EST


On 20.02.20 06:49, Alex Ghiti wrote:
Hi Jan,

On 2/16/20 2:56 PM, Alex Ghiti wrote:
On 2/16/20 11:05 AM, Jan Kiszka wrote:
On 16.02.20 15:41, Alex Ghiti wrote:
Hi Jan,

On 2/15/20 6:49 AM, Jan Kiszka wrote:
From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>

Those are not backed by page structs, and pte_page is returning an
invalid pointer.

Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
=2D--
  arch/riscv/mm/cacheflush.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..9ee2c1a387cc 100644
=2D-- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
  {
      struct page *page =3D pte_page(pte);

-    if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+    if (!pfn_valid(pte_pfn(pte)) ||
+        !test_and_set_bit(PG_dcache_clean, &page->flags))
          flush_icache_all();
  }
  #endif /* CONFIG_MMU */
=2D-
2.16.4



When did you encounter such a situation ? i.e. executable code that is
not backed by struct page ?

Riscv uses the generic implementation of ioremap and the way
_PAGE_IOREMAP is defined does not allow to map executable memory region
using ioremap, so I'm interested to understand how we end up in
flush_icache_pte for an executable region not backed by any struct
page.

You can create executable mappings of memory that Linux does not
initially consider as RAM via ioremap_prot or ioremap_page_range. We are
using that in Jailhouse to load the hypervisor code into reserved memory
that is ioremapped for the purpose. Works fine on x86, arm and arm64.

Jan

Ok thanks, I had missed this API.

Regarding your patch, I find it weird to do anything if the pfn is
invalid, we could have garbage in pte pointing to an invalid region
for example (I admit that the effect of flushing the icache would not
be catastrophic in that situation).

I'm not saying I will come with a better solution but I'll take a
deeper look tomorrow.

Alex


I took a look at the Jailhouse driver. After loading the hypervisor into
the ioremapped region, it explicitly ensures icache/dcache consistency
by calling flush_icache_range here:

https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505


Yeah, the arm64 port needed this.

There seems to be an implicit (?) rule that states that in-kernel code
modification must handle icache/dcache consistency:

In arm64 set_pte_at definition, they do not sync icache/dcache when the
pte is kernel:

https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271


In mips, they do the same:

https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137

So funnily, I'd do the contrary of what you have done, the mips way:

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..c90c8bb49109 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte)
 {
        struct page *page = pte_page(pte);

+       if (unlikely(!pfn_valid(pte_pfn(pte))))
+               return;
+
        if (!test_and_set_bit(PG_dcache_clean, &page->flags))
                flush_icache_all();
 }

What do you think ?


I wouldn't mind doing it like above. I suspect that became the common
simple pattern because no one expected a use case like with Jailhouse.
But I'm by far not an expert in mm topics in the kernel.

Jan