Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

From: Ard Biesheuvel
Date: Mon Oct 30 2023 - 04:15:15 EST


On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
>
> On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> >
> > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > > >
> > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > It looks like this is fallout from the LPA2 enablement.
> > > > >
> > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > >
> > > > > 0b101011 When FEAT_LPA2 is implemented:
> > > > > Translation fault, level -1.
> > > > >
> > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > fixup_exception(), causing them to be fatal.
> > > > >
> > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > dropping the LPA2 patches for the moment.
> > > > >
> > > >
> > > > Thanks for the analysis Mark.
> > > >
> > > > I agree that this should not be difficult to fix, but given the other
> > > > CI problems and identified loose ends, I am not going to object to
> > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > onto v6.7-rc1 :-)
> > >
> > > I am happy to test any proposed fix patch.
> > >
> >
> > Thanks Naresh. Patch attached.
>
> This patch did not solve the reported problem.
> Test log links,
> - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
>

Oops, sorry about that.

Fixed patch attched.
From 97dea432bceadfcece84484609374c277afc2c81 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb@xxxxxxxxxx>
Date: Sat, 28 Oct 2023 09:40:29 +0200
Subject: [PATCH v2] Add missing ESR decoding for level -1 translation faults

Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
---
arch/arm64/mm/fault.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2e5d1e238af9..13f192691060 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -780,18 +780,18 @@ static const struct fault_info fault_info[] = {
{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" },
{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" },
{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 8" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 0 access flag fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 12" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 0 permission fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" },
{ do_sea, SIGBUS, BUS_OBJERR, "synchronous external abort" },
{ do_tag_check_fault, SIGSEGV, SEGV_MTESERR, "synchronous tag check fault" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 18" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 19" },
+ { do_sea, SIGKILL, SI_KERNEL, "level -1 (translation table walk)" },
{ do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" },
{ do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" },
{ do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" },
@@ -799,7 +799,7 @@ static const struct fault_info fault_info[] = {
{ do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented
{ do_bad, SIGKILL, SI_KERNEL, "unknown 25" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 26" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 27" },
+ { do_sea, SIGKILL, SI_KERNEL, "level -1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
{ do_sea, SIGKILL, SI_KERNEL, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
{ do_sea, SIGKILL, SI_KERNEL, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
{ do_sea, SIGKILL, SI_KERNEL, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
@@ -813,9 +813,9 @@ static const struct fault_info fault_info[] = {
{ do_bad, SIGKILL, SI_KERNEL, "unknown 38" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 39" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 40" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 41" },
+ { do_bad, SIGKILL, SI_KERNEL, "level -1 address size fault" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 42" },
- { do_bad, SIGKILL, SI_KERNEL, "unknown 43" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level -1 translation fault" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 44" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 45" },
{ do_bad, SIGKILL, SI_KERNEL, "unknown 46" },
--
2.42.0.820.g83a721a137-goog