Re: boot flooded with unwind: Index not found

From: Ard Biesheuvel
Date: Wed Mar 02 2022 - 04:46:02 EST


On Wed, 2 Mar 2022 at 09:55, Corentin Labbe <clabbe.montjoie@xxxxxxxxx> wrote:
>
> Le Wed, Mar 02, 2022 at 09:44:52AM +0100, Ard Biesheuvel a écrit :
> > On Wed, 2 Mar 2022 at 09:40, Corentin Labbe <clabbe.montjoie@xxxxxxxxx> wrote:
> > >
> > > Le Tue, Mar 01, 2022 at 05:52:30PM +0100, Ard Biesheuvel a écrit :
> > > > On Tue, 1 Mar 2022 at 17:37, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, 1 Mar 2022 at 16:52, Russell King (Oracle)
> > > > > <linux@xxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Mar 01, 2022 at 04:48:25PM +0100, Corentin Labbe wrote:
> > > > > > > Hello
> > > > > > >
> > > > > > > I booted today linux-next (20220301) and my boot is flooded with:
> > > > > > > [ 0.000000] unwind: Index not found c0f0c440
> > > > > > > [ 0.000000] unwind: Index not found 00000000
> > > > > > > [ 0.000000] unwind: Index not found c0f0c440
> > > > > > > [ 0.000000] unwind: Index not found 00000000
> > > > > > >
> > > > > > > This happen on a sun8i-a83t-bananapi-m3
> > > > > >
> > > > > > Have you enabled vmapped stacks?
> > > > > >
> > > > >
> > > > > This is probably related to
> > > > >
> > > > > 538b9265c063 ARM: unwind: track location of LR value in stack frame
> > > > >
> > > > > which removes a kernel_text_address() check on frame->pc as it is
> > > > > essentially redundant, given that we won't find unwind data otherwise.
> > > > > Unfortunately, I failed to realise that the other check carries a
> > > > > pr_warn(), which may apparently fire spuriously in some cases.
> > > > >
> > > > > The 0x0 value can easily be filtered out, but i would be interesting
> > > > > where the other value originates from. We might be able to solve this
> > > > > with a simple .nounwind directive in a asm routine somewhere.
> > > > >
> > > > > I'll prepare a patch that disregards the 0x0 value - could you check
> > > > > in the mean time what the address 0xcf0c440 coincides with in your
> > > > > build?
> > > >
> > > > Something like the below should restore the previous behavior, while
> > > > taking the kernel_text_address() check out of the hot path.
> > > >
> > > > --- a/arch/arm/kernel/unwind.c
> > > > +++ b/arch/arm/kernel/unwind.c
> > > > @@ -400,7 +400,8 @@ int unwind_frame(struct stackframe *frame)
> > > >
> > > > idx = unwind_find_idx(frame->pc);
> > > > if (!idx) {
> > > > - pr_warn("unwind: Index not found %08lx\n", frame->pc);
> > > > + if (frame->pc && kernel_text_address(frame->pc))
> > > > + pr_warn("unwind: Index not found %08lx\n", frame->pc);
> > > > return -URC_FAILURE;
> > > > }
> > >
> > > Hello
> > >
> > > This is a more detailed trace from my follow up after your patch:
> >
> > So the log below is from a kernel that has the above patch applied?
> > Could you please share the .config?
> >
>
> Yes this is a kernel with above patch applied (this board do not boot without it).

It's not entirely clear to me how (or whether) the recent changes to
unwind.c cause this issue, but one thing that stands out in the
current code is the unguarded dereference of a value pulled of the
stack as a memory address.

It is worth noting that the only unwind entries in vmlinux that load
SP from the stack directly (as opposed to unwinding it by moving from
the frame pointer or by addition/subtraction) are the
__irq_svc/__pabt_svc/__dabt_svc entry routines, and given that the
bogus address 60000013 looks suspiciously like a PSR value (which is
stored in the vicinity of SP on the exception stack), my suspicion is
that some unwinder annotations are out of sync with the actual code.

So while the below does not fix the root cause, i.e., that the
unwinder unwinds SP incorrectly causing us to dereference a bogus
pointer, it should avoid the subsequent crash. Please give it a go.

--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/uaccess.h>
#include <linux/list.h>

#include <asm/sections.h>
@@ -236,10 +237,11 @@ static int unwind_pop_register(struct
unwind_ctrl_block *ctrl,
if (*vsp >= (unsigned long *)ctrl->sp_high)
return -URC_FAILURE;

- /* Use READ_ONCE_NOCHECK here to avoid this memory access
- * from being tracked by KASAN.
+ /* Use get_kernel_nofault() here to avoid this memory access
+ * from causing a fatal fault, and from being tracked by KASAN.
*/
- ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+ if (get_kernel_nofault(ctrl->vrs[reg], *vsp))
+ return -URC_FAILURE;
if (reg == 14)
ctrl->lr_addr = *vsp;
(*vsp)++;