Re: Tracehooks in scheduler

From: Quentin Perret
Date: Fri Apr 26 2019 - 06:26:47 EST


Hi Qais,

On Monday 15 Apr 2019 at 15:49:45 (+0100), Qais Yousef wrote:
> Hi Steve, Peter
>
> > On 04/07/19 18:52, Qais Yousef wrote:
> > > Hi Steve, Peter
> > >
> > > I know the topic has sprung up in the past but I couldn't find anything that
> > > points into any conclusion.
> > >
> > > As far as I understand new TRACE_EVENTS() in the scheduler (and probably other
> > > subsystems) isn't desirable as it intorduces a sort of ABI that can be painful
> > > to maintain.
> > >
> > > But for us to be able to test various aspect of EAS, we rely on some events
> > > that track load_avg, util_avg and some other metrics in the scheduler.
> > > Example of such patches that are in android and we maintain out of tree can be
> > > found here:
> > >
> > > https://android.googlesource.com/kernel/common/+/42903694913697da88a4ac627a92bbfdf44f0a2e
> > > https://android.googlesource.com/kernel/common/+/6dfaed989ea4ca223f0913dfc11cdafd9664fc1c
> > >
> > > Dietmar and Quentin pointed me to a discussion you guys had with Daniel Bristot
> > > in the last LPC when he had a similar need. So it is something that could
> > > benefit other users as well.
> > >
> > > What is the best way forward to be able to add tracehooks into the scheduler
> > > and any other subsystem for that matters?
> > >
> > > We tried using DECLARE_TRACE() to create a tracepoint which doesn't export
> > > anything in /sys/kernel/debug/tracing/events and hoped that we can use eBPF or
> > > a kernel module to attach to this tracepoint and access the args to inject our
> > > own trace_printks() but this didn't work. The glue logic necessary to attach
> > > to this tracepoint in a similar manner to how RAW_TRACEPOINT() in eBPF works
> > > isn't there AFAICT.
> > >
> > > I can post the full example if the above doesn't make sense. I am still
> > > familiarizing myself with the different aspects of this code as well. There
> > > might be support for what we want but I failed to figure out the magic
> > > combination to get it to work.
> > >
> > > If I got this glue logic done, would this be an acceptable solution? If not, do
> > > you have any suggestions on how to progress?
>
> I have written some patches in hope it'll clarify further what we are trying to
> achieve here and what would be the best possible approach about it.
>
> I have taken two approaches to solve the problem.
>
>
> 1.
>
> https://github.com/qais-yousef/linux/commit/e7d0aa7ff1328195f314b0730c4cc744dec4261e
>
> In this approach everything we need is already available and we just
> need to create new tracepoints as described in
> Documentation/trace/tracepoints.rst and export it with
> EXPORT_TRACEPOINT_SYMBOL_GPL().
>
> A user then can have an out of tree module to probe this tp and
> manipulate it as they like.
>
> Example of such a module is here, the pelt_se tp is to demo the
> approach:
>
> https://github.com/qais-yousef/tracepoints-helpers/blob/master/module-pelt-se/probe_tp_pelt_se.c
>
> Googling around I can see that the use of
> EXPORT_TRACEPOINT_SYMBOL_GPL() is not desired unless the module is
> in-tree which I doubt will be the case here.
>
> https://lore.kernel.org/lkml/20150422130052.4996e231@xxxxxxxxxxxxxxxxxx/
>
> 2.
> https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> https://github.com/qais-yousef/linux/commit/edd2498c5bbfca1a26acd151a4e3323e511f3455
>
> In this approach I try to allow attaching to a TP using eBPF. Sadly the
> current infrastructure is lacking so I hacked the above up to create a
> new DECLARE_TRACE_HOOK() macro which will allow using eBPF but without
> exporting anything in debugfs that can constitute an ABI.
>
> The following eBPF program can be used then to attach and access some
> info at the TP:
>
> https://github.com/qais-yousef/tracepoints-helpers/blob/master/bpf/tp_trace_printk_pelt_se
>
>
> Does any of the above approaches make sense?

For the EAS-testing use-case you mentioned earlier, it's really for
debugging so we don't actually need the eBPF safety. None of this is
supposed to run in production I would say. So I tend to prefer option 1
if that works for everybody interested in this thing.

And then what would be the story ? We would carry a module out-of-tree
in our test suite to extract scheduler data and then post-process it in
userspace or something ? Since that would be an out-of-tree module,
upstream doesn't commit to anything to userspace, so perhaps that could
work.

Another thing, should these sched tracepoints be guarded by sched_debug ?

Thanks,
Quentin