Re: çåï[PATCH] perf core: Use KSTK_ESP() instead of pt_regs->sp while output user regs

From: Andy Lutomirski
Date: Fri Jan 02 2015 - 13:03:28 EST


On Jan 2, 2015 8:11 AM, "Jan Beulich" <jbeulich@xxxxxxxx> wrote:
>
> >>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> 12/31/14 3:00 AM >>>
> >On Tue, Dec 30, 2014 at 3:29 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >> Given how the x86_64* entry code works, using task_pt_regs from
> >> anywhere except explicitly supported contexts (including exceptions
> >> that originated in userspace and a small handful of system calls) is
> >> asking for trouble. NMI context is especially bad.
> >>
> >> How important is this feature, and which registers matter? It might
> >> be possible to use a dwarf unwinder on the kernel call stack to get
> >> most of the regs from most contexts, and it might also be possible to
> >> make small changes to the entry code to make it possible to get some
> >> of the registers reliably, but it's not currently possible to safely
> >> use task_pt_regs *at all* from NMI context unless you've at least
> >> blacklisted a handful of origin RIP values that give dangerously bogus
> >> results. (Using do_nmi's regs parameter if user_mode_vm(regs) is a
> >> different story.)
> >
> >It's actually worse than just knowing the interrupted kernel RIP. If
> >the call chain goes usermode -> IST exception -> NMI, then
> >task_pt_regs is entirely uninitialized. Assuming all the CFI
> >annotations are correct, the unwinder could still do it from the
> >kernel.
> >
> >Note that, as far as I know, Jan Beulich is the only person who uses
> >the unwinder on kernel code. Jan, how do you do this?
>
> Trying to guess what you mean by "this": A stack switch gets expressed by
> CFI annotations just like any other frame pointer adjustments. See for example
> the CFI_DEF_CFA_REGISTER use in the SAVE_ARGS_IRQ macro.
>
> If that wasn't your question, please be more precise.

Sorry, my question was vague.

Is there any way to consume these annotations at runtime in the
kernel? The goal would be for perf's NMI handler to consume the CFI
data to figure out the userspace registers. I'm guessing that the
answer might be no, because we seem to be compiling with
-fno-asynchronous-unwind-tables and we don't seem to be putting any
.eh_frame stuff into the final kernel image.

I had thought that someone implemented runtime DWARF unwinding, though.

--Andy

>
> Jan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/