Re: [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint

From: Qais Yousef
Date: Mon Sep 09 2019 - 06:59:59 EST


On 09/04/19 19:48, Peter Zijlstra wrote:
> On Wed, Sep 04, 2019 at 03:37:11PM +0100, Qais Yousef wrote:
>
> > I managed to hook into sched_switch to get the nr_running of cfs tasks via
> > eBPF.
> >
> > ```
> > int on_switch(struct sched_switch_args *args) {
> > struct task_struct *prev = (struct task_struct *)bpf_get_current_task();
> > struct cgroup *prev_cgroup = prev->cgroups->subsys[cpuset_cgrp_id]->cgroup;
> > const char *prev_cgroup_name = prev_cgroup->kn->name;
> >
> > if (prev_cgroup->kn->parent) {
> > bpf_trace_printk("sched_switch_ext: nr_running=%d prev_cgroup=%s\\n",
> > prev->se.cfs_rq->nr_running,
> > prev_cgroup_name);
> > } else {
> > bpf_trace_printk("sched_switch_ext: nr_running=%d prev_cgroup=/\\n",
> > prev->se.cfs_rq->nr_running);
> > }
> > return 0;
> > };
> > ```
> >
> > You can do something similar by attaching to the sched_switch tracepoint from
> > a module and a create a new event to get the nr_running.
> >
> > Now this is not as accurate as your proposed new tracepoint in terms where you
> > sample nr_running, but should be good enough?
>
> The above is after deactivate() and gives an up-to-date count for
> decrements. Attach something to trace_sched_wakeup() to get the
> increment update.

I just remembered that sched_switch and sched_wakeup aren't
EXPORT_TRACEPOINT*() so can't be attached to via out of tree module. But still
accessible via eBPF.

There has been several attempts to export these tracepoints but they were
NACKed because there was no in-kernel module that needed them.

https://lore.kernel.org/lkml/20150422130052.4996e231@xxxxxxxxxxxxxxxxxx/

--
Qais Yousef