Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

From: Frederic Weisbecker
Date: Wed Apr 30 2014 - 17:56:17 EST


On Sun, Apr 27, 2014 at 02:08:20PM +0200, Ingo Molnar wrote:
>
> * Richard Yao <ryao@xxxxxxxxxx> wrote:
>
> > Stack traces are generated by scanning the stack and interpeting
> > anything that looks like it could be a pointer to something. We do
> > not need to do this when we have frame pointers, but we do it
> > anyway, with the distinction that we use the return pointers to mark
> > actual frames by the absence of a question mark.
> >
> > The additional verbosity of stack scanning tends to bombard us with
> > walls of text for no gain in practice, so lets switch to printing
> > only stack frames when frame pointers are available. That we can
> > spend less time reading stack traces and more time looking at code.
> >
> > Signed-off-by: Richard Yao <ryao@xxxxxxxxxx>
> > ---
> > arch/x86/kernel/dumpstack.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> > index d9c12d3..94ffe06 100644
> > --- a/arch/x86/kernel/dumpstack.c
> > +++ b/arch/x86/kernel/dumpstack.c
> > @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
> > static const struct stacktrace_ops print_trace_ops = {
> > .stack = print_trace_stack,
> > .address = print_trace_address,
> > +#ifdef CONFIG_FRAME_POINTER
> > + .walk_stack = print_context_stack_bp,
> > +#else
> > .walk_stack = print_context_stack,
> > +#endif
> > };

Besides the complementary informations brought by the full stack walk,
another big argument toward keeping full stack walk is that if your frame
pointer is screwed for whatever reason, you still have a useful stack trace.

I have seen and fixed several broken frame links in x86-64 by the past. Those
are very subtle and often hardly visible issues because, if they are easily spotted
on common frame scenarios like : task > irq, they are much harder to find on trickier,
rarer frame scenarios such as: task -> softirq -> irq -> nmi -> debug exception ->....

For example before a2bbe75089d5eb9a3a46d50dd5c215e213790288
("x86: Don't use frame pointer to save old stack on irq entry"), we were missing
entire stack frames on nesting irqs (hardirq on softirqs) while using pure frame
pointer based unwinding.

Who knows if we have other remaining issues like this? Especially given the high
possible number of frame combinations between task, irq, softirq, nmi and exceptions.
Multiply the contexts possibility by the number of possible archs out there and their
stack switch implementations.

Also further frame links breakages, we have many other possibilities to end up
with misleading frame pointers. Relying on that source alone definetly reduce the
reliability of our stacktraces.

So this goes way beyond just missing complementary informations. Debugging robustness
itself is actually very concerned here if we remove the full stack walk.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/