Re: [PATCH 21/35] arm64: mte: Add in-kernel tag fault handler

From: Andrey Konovalov
Date: Thu Aug 27 2020 - 08:59:18 EST


On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
<catalin.marinas@xxxxxxx> wrote:
>
> On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 5e832b3387f1..c62c8ba85c0e 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -33,6 +33,7 @@
> > #include <asm/debug-monitors.h>
> > #include <asm/esr.h>
> > #include <asm/kprobes.h>
> > +#include <asm/mte.h>
> > #include <asm/processor.h>
> > #include <asm/sysreg.h>
> > #include <asm/system_misc.h>
> > @@ -222,6 +223,20 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
> > return 1;
> > }
> >
> > +static bool is_el1_mte_sync_tag_check_fault(unsigned int esr)
> > +{
> > + unsigned int ec = ESR_ELx_EC(esr);
> > + unsigned int fsc = esr & ESR_ELx_FSC;
> > +
> > + if (ec != ESR_ELx_EC_DABT_CUR)
> > + return false;
> > +
> > + if (fsc == ESR_ELx_FSC_MTE)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > static bool is_el1_instruction_abort(unsigned int esr)
> > {
> > return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
> > @@ -294,6 +309,18 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
> > do_exit(SIGKILL);
> > }
> >
> > +static void report_tag_fault(unsigned long addr, unsigned int esr,
> > + struct pt_regs *regs)
> > +{
> > + bool is_write = ((esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT) != 0;
> > +
> > + pr_alert("Memory Tagging Extension Fault in %pS\n", (void *)regs->pc);
> > + pr_alert(" %s at address %lx\n", is_write ? "Write" : "Read", addr);
> > + pr_alert(" Pointer tag: [%02x], memory tag: [%02x]\n",
> > + mte_get_ptr_tag(addr),
> > + mte_get_mem_tag((void *)addr));
> > +}
> > +
> > static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> > struct pt_regs *regs)
> > {
> > @@ -317,12 +344,16 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> > msg = "execute from non-executable memory";
> > else
> > msg = "read from unreadable memory";
> > + } else if (is_el1_mte_sync_tag_check_fault(esr)) {
> > + report_tag_fault(addr, esr, regs);
> > + msg = "memory tagging extension fault";
>
> IIUC, that's dead code. See my comment below on do_tag_check_fault().
>
> > } else if (addr < PAGE_SIZE) {
> > msg = "NULL pointer dereference";
> > } else {
> > msg = "paging request";
> > }
> >
> > +
>
> Unnecessary empty line.
>
> > die_kernel_fault(msg, addr, esr, regs);
> > }
> >
> > @@ -658,10 +689,27 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> > return 0;
> > }
> >
> > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > + struct pt_regs *regs)
> > +{
> > + report_tag_fault(addr, esr, regs);
> > +
> > + /* Skip over the faulting instruction and continue: */
> > + arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
>
> Ooooh, do we expect the kernel to still behave correctly after this? I
> thought the recovery means disabling tag checking altogether and
> restarting the instruction rather than skipping over it.

The intention is to be able to catch multiple MTE faults without
panicking or disabling MTE when executing KASAN tests (those do
multiple bad accesses one after another). We do
arm64_skip_faulting_instruction() for software tag-based KASAN too,
it's not ideal, but works for testing purposes.

Can we disable MTE, reexecute the instruction, and then reenable MTE,
or something like that?

When running in-kernel MTE in production, we'll either panic or
disable MTE after the first fault. This was controlled by the
panic_on_mte_fault option Vincenzo initially had.

> We only skip if we emulated it.

I'm not sure I understand this part, what do you mean by emulating?

>
> > +
> > + return 0;
> > +}
> > +
> > +
> > static int do_tag_check_fault(unsigned long addr, unsigned int esr,
> > struct pt_regs *regs)
> > {
> > - do_bad_area(addr, esr, regs);
> > + /* The tag check fault (TCF) is per TTBR */
> > + if (is_ttbr0_addr(addr))
> > + do_bad_area(addr, esr, regs);
> > + else
> > + do_tag_recovery(addr, esr, regs);
>
> So we never invoke __do_kernel_fault() for a synchronous tag check in
> the kernel. What's with all the is_el1_mte_sync_tag_check_fault() check
> above?
>
> --
> Catalin
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@xxxxxxxxxxxxxxxx.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200827095429.GC29264%40gaia.