Re: [PATCH 6/7] arch: __get_wchan || STACKTRACE_SUPPORT
From: Peter Zijlstra
Date: Fri Oct 08 2021 - 09:46:53 EST
On Fri, Oct 08, 2021 at 01:40:52PM +0100, Mark Rutland wrote:
> [Adding Josh, since there might be a concern here from a livepatch pov]
>
> > +static unsigned long __get_wchan(struct task_struct *p)
> > +{
> > + unsigned long entry = 0;
> > +
> > + stack_trace_save_tsk(p, &entry, 1, 0);
>
> This assumes stack_trace_save_tsk() will skip sched functions, but I
> don't think that's ever been a requirement? It's certinaly not
> documented anywhere that I could find, and arm64 doesn't do so today,
> and this patch causes wchan to just log `__switch_to` for everything.
Confused, arm64 has arch_stack_walk() and should thus use
kernel/stacktrace.c's stack_trace_consume_entry_nosched.
> I realise you "fix" that for some arches in the next patch, but it's not
> clear to me that's the right thing to do -- I would expect that
I only actually change the behaviour on csky, both mips and nds32 have
this 'savesched = (task == current)' logic which ends up being a very
confusing way to write things, but for wchan we never call on current,
and hence don't save the __sched functions.
> stack_trace_save_tsk() *shouldn't* skip anything unless we've explicitly
> told it to via skipnr, because I'd expect that
It's what most archs happen to do today and is what
stack_trace_save_tsk() as implemented using arch_stack_walk() does.
Which is I think the closest to canonical we have.
> stack_trace_save_tsk_reliable() mustn't, in case we ever need to patch
> anything in the scheduler (or arch ctxsw code) with a livepatch, or if
> you ever *want* to have the sched functions in a trace.
>
> So I have two big questions:
>
> 1) Where precisely should stack_trace_save_tsk() and
> stack_trace_save_tsk_reliable() start from?
>
> 1) What should you do when you *do* want sched functions in a trace?
>
> We could side-step the issue here by using arch_stack_walk(), which'd
> make it easy to skip sched functions in the core code.
arch_stack_walk() is the modern API and should be used going forward,
and I've gone with the stack_trace_save*() implementation as per that.