Re: boot flooded with unwind: Index not found

From: Corentin Labbe
Date: Wed Mar 02 2022 - 05:09:59 EST


Le Wed, Mar 02, 2022 at 10:45:46AM +0100, Ard Biesheuvel a écrit :
> On Wed, 2 Mar 2022 at 09:55, Corentin Labbe <clabbe.montjoie@xxxxxxxxx> wrote:
> >
> > Le Wed, Mar 02, 2022 at 09:44:52AM +0100, Ard Biesheuvel a écrit :
> > > On Wed, 2 Mar 2022 at 09:40, Corentin Labbe <clabbe.montjoie@xxxxxxxxx> wrote:
> > > >
> > > > Le Tue, Mar 01, 2022 at 05:52:30PM +0100, Ard Biesheuvel a écrit :
> > > > > On Tue, 1 Mar 2022 at 17:37, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, 1 Mar 2022 at 16:52, Russell King (Oracle)
> > > > > > <linux@xxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Mar 01, 2022 at 04:48:25PM +0100, Corentin Labbe wrote:
> > > > > > > > Hello
> > > > > > > >
> > > > > > > > I booted today linux-next (20220301) and my boot is flooded with:
> > > > > > > > [ 0.000000] unwind: Index not found c0f0c440
> > > > > > > > [ 0.000000] unwind: Index not found 00000000
> > > > > > > > [ 0.000000] unwind: Index not found c0f0c440
> > > > > > > > [ 0.000000] unwind: Index not found 00000000
> > > > > > > >
> > > > > > > > This happen on a sun8i-a83t-bananapi-m3
> > > > > > >
> > > > > > > Have you enabled vmapped stacks?
> > > > > > >
> > > > > >
> > > > > > This is probably related to
> > > > > >
> > > > > > 538b9265c063 ARM: unwind: track location of LR value in stack frame
> > > > > >
> > > > > > which removes a kernel_text_address() check on frame->pc as it is
> > > > > > essentially redundant, given that we won't find unwind data otherwise.
> > > > > > Unfortunately, I failed to realise that the other check carries a
> > > > > > pr_warn(), which may apparently fire spuriously in some cases.
> > > > > >
> > > > > > The 0x0 value can easily be filtered out, but i would be interesting
> > > > > > where the other value originates from. We might be able to solve this
> > > > > > with a simple .nounwind directive in a asm routine somewhere.
> > > > > >
> > > > > > I'll prepare a patch that disregards the 0x0 value - could you check
> > > > > > in the mean time what the address 0xcf0c440 coincides with in your
> > > > > > build?
> > > > >
> > > > > Something like the below should restore the previous behavior, while
> > > > > taking the kernel_text_address() check out of the hot path.
> > > > >
> > > > > --- a/arch/arm/kernel/unwind.c
> > > > > +++ b/arch/arm/kernel/unwind.c
> > > > > @@ -400,7 +400,8 @@ int unwind_frame(struct stackframe *frame)
> > > > >
> > > > > idx = unwind_find_idx(frame->pc);
> > > > > if (!idx) {
> > > > > - pr_warn("unwind: Index not found %08lx\n", frame->pc);
> > > > > + if (frame->pc && kernel_text_address(frame->pc))
> > > > > + pr_warn("unwind: Index not found %08lx\n", frame->pc);
> > > > > return -URC_FAILURE;
> > > > > }
> > > >
> > > > Hello
> > > >
> > > > This is a more detailed trace from my follow up after your patch:
> > >
> > > So the log below is from a kernel that has the above patch applied?
> > > Could you please share the .config?
> > >
> >
> > Yes this is a kernel with above patch applied (this board do not boot without it).
>
> It's not entirely clear to me how (or whether) the recent changes to
> unwind.c cause this issue, but one thing that stands out in the
> current code is the unguarded dereference of a value pulled of the
> stack as a memory address.
>
> It is worth noting that the only unwind entries in vmlinux that load
> SP from the stack directly (as opposed to unwinding it by moving from
> the frame pointer or by addition/subtraction) are the
> __irq_svc/__pabt_svc/__dabt_svc entry routines, and given that the
> bogus address 60000013 looks suspiciously like a PSR value (which is
> stored in the vicinity of SP on the exception stack), my suspicion is
> that some unwinder annotations are out of sync with the actual code.
>
> So while the below does not fix the root cause, i.e., that the
> unwinder unwinds SP incorrectly causing us to dereference a bogus
> pointer, it should avoid the subsequent crash. Please give it a go.
>
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -27,6 +27,7 @@
> #include <linux/sched.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> +#include <linux/uaccess.h>
> #include <linux/list.h>
>
> #include <asm/sections.h>
> @@ -236,10 +237,11 @@ static int unwind_pop_register(struct
> unwind_ctrl_block *ctrl,
> if (*vsp >= (unsigned long *)ctrl->sp_high)
> return -URC_FAILURE;
>
> - /* Use READ_ONCE_NOCHECK here to avoid this memory access
> - * from being tracked by KASAN.
> + /* Use get_kernel_nofault() here to avoid this memory access
> + * from causing a fatal fault, and from being tracked by KASAN.
> */
> - ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> + if (get_kernel_nofault(ctrl->vrs[reg], *vsp))
> + return -URC_FAILURE;
> if (reg == 14)
> ctrl->lr_addr = *vsp;
> (*vsp)++;

The crash disappeared (but the suspicious RCU usage is still here).