Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations

From: Josh Poimboeuf
Date: Thu Aug 11 2016 - 10:28:50 EST


On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
> On Aug 10, 2016 5:16 PM, "Josh Poimboeuf" <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On Wed, Aug 10, 2016 at 12:25:11AM -0700, Andy Lutomirski wrote:
> > > On Aug 10, 2016 2:27 AM, "Josh Poimboeuf" <jpoimboe@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> > > > > On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > > > > > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > > > > > new file mode 100644
> > > > > > index 0000000..f28f1b5
> > > > > > --- /dev/null
> > > > > > +++ b/arch/x86/kernel/unwind_frame.c
> > > > > > @@ -0,0 +1,84 @@
> > > > > > +#include <linux/sched.h>
> > > > > > +#include <asm/ptrace.h>
> > > > > > +#include <asm/bitops.h>
> > > > > > +#include <asm/stacktrace.h>
> > > > > > +#include <asm/unwind.h>
> > > > > > +
> > > > > > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > > > > > +
> > > > > > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > > > > > +{
> > > > > > + unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > > > > > + unsigned long addr;
> > > > > > +
> > > > > > + if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > > > > > + return 0;
> > > > > > +
> > > > > > + addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > > > > > + addr_p);
> > > > > > +
> > > > > > + return __kernel_text_address(addr) ? addr : 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > > > > > +
> > > > > > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > > > > > + size_t len)
> > > > > > +{
> > > > > > + struct stack_info *info = &state->stack_info;
> > > > > > +
> > > > > > + if (on_stack(info, addr, len))
> > > > > > + return true;
> > > > > > +
> > > > > > + if (get_stack_info(info->next_sp, state->task, info,
> > > > > > + &state->stack_mask))
> > > > > > + goto unknown;
> > > > > > +
> > > > > > + if (!on_stack(info, addr, len))
> > > > > > + goto unknown;
> > > > > > +
> > > > > > + return true;
> > > > > > +
> > > > > > +unknown:
> > > > > > + info->type = STACK_TYPE_UNKNOWN;
> > > > > > + return false;
> > > > > > +}
> > > > > > +
> > > > > > +bool unwind_next_frame(struct unwind_state *state)
> > > > > > +{
> > > > > > + unsigned long *next_bp;
> > > > > > +
> > > > > > + if (unwind_done(state))
> > > > > > + return false;
> > > > > > +
> > > > > > + next_bp = (unsigned long *)*state->bp;
> > > > > > +
> > > > > > + /*
> > > > > > + * Make sure the next frame is on a valid stack and can be accessed
> > > > > > + * safely.
> > > > > > + */
> > > > > > + if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > > > > > + return false;
> > > > > > +
> > > > > > + /* move to the next frame */
> > > > > > + state->bp = next_bp;
> > > > > > + return true;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > > > > > +
> > > > > > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > > > > > + struct pt_regs *regs, unsigned long *sp)
> > > > > > +{
> > > > > > + memset(state, 0, sizeof(*state));
> > > > > > +
> > > > > > + state->task = task;
> > > > > > + state->bp = get_frame_pointer(task, regs);
> > > > > > +
> > > > > > + get_stack_info(state->bp, state->task, &state->stack_info,
> > > > > > + &state->stack_mask);
> > > > > > + update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > > > > > +
> > > > > > + /* unwind to the first frame after the specified stack pointer */
> > > > > > + while (state->bp < sp && !unwind_done(state))
> > > > > > + unwind_next_frame(state);
> > > > >
> > > > > Do we unwind all the frames here? It seems strange to me that in a
> > > > > function named __unwind_start(), we unwind all the frames.
> > > >
> > > > It just skips any stack frames before the specified "sp" pointer.
> > > > Several callers use this, for example, to start at regs->sp instead of
> > > > the current stack frame. I'll try to make the comment clearer.
> > > >
> > >
> > > Are you checking the right condition? Shouldn't this check that sp is
> > > in bounds for the current stack if a stack switch happened?
> >
> > You're right.
> >
> > > I admit I don't fully understand the use case. If someone wants to
> > > start a trace in the middle, shouldn't they just pass regs in?
> >
> > The regs aren't always available. Some callers just want to skip the
> > first few frames so the stack dump code itself isn't traced.
>
> I suspect that all users are okay with your algorithm simply because
> they don't switch stacks. Maybe the thing to do is to stop advancing
> when sp is passed or if the stack switches at all.

There are actually some cases which could be broken by the sloppy "while
state->bp < sp" check. When starting with sp from 'regs->sp', sp often
points to a different stack than the current one. It seemed to work in
my testing, but I guess I got lucky, with irq percpu stack addresses
being smaller than the thread stack addresses.

> Could you point me at a user that passes anything other than regs->sp
> here? On brief inspection, I haven't found any at all.

For example, see show_stack().

--
Josh