Re: [RFC PATCH] tracing: add sched_prio_update

From: Steven Rostedt
Date: Wed Jul 06 2016 - 09:15:01 EST


On Tue, 5 Jul 2016 21:50:34 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

> >
> >> +
> >> + TP_PROTO(struct task_struct *tsk),
> >> +
> >> + TP_ARGS(tsk),
> >> +
> >> + TP_STRUCT__entry(
> >> + __array( char, comm, TASK_COMM_LEN )
> >
> > I could imagine this being a high frequency tracepoint, especially with
> > a lot of boosting going on. Can we nuke the comm recording and let the
> > userspace tools just hook to the sched_switch tracepoint for that?
>
> We can surely do that.
>
> Just to clarify: currently this tracepoint is *not* hooked on PI boosting,
> as described in the changelog. This tracepoint is about the prio attributes
> set by user-space. The PI boosting temporarily changes the task struct prio
> without updating the associated policy, which seems rather
> implementation-specific and odd to expose.
>
> Thoughts ?

Ah, you're right, I was thinking it was at boosting. But still, it's a
rather hefty tracepoint (lots of fields), probably want to keep from
adding comm too.

>
> Thanks,
>
> Mathieu
>
>
> >
> > -- Steve
> >
> >
> >> + __field( pid_t, pid )
> >> + __field( unsigned int, policy )
> >> + __field( int, nice )
> >> + __field( unsigned int, rt_priority )
> >> + __field( u64, dl_runtime )
> >> + __field( u64, dl_deadline )
> >> + __field( u64, dl_period )
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> >> + __entry->pid = tsk->pid;
> >> + __entry->policy = tsk->policy;
> >> + __entry->nice = task_nice(tsk);
> >> + __entry->rt_priority = tsk->rt_priority;
> >> + __entry->dl_runtime = tsk->dl.dl_runtime;
> >> + __entry->dl_deadline = tsk->dl.dl_deadline;
> >> + __entry->dl_period = tsk->dl.dl_period;
> >> + ),
> >> +
> >> + TP_printk("comm=%s pid=%d, policy=%s, nice=%d, rt_priority=%u, "
> >> + "dl_runtime=%Lu, dl_deadline=%Lu, dl_period=%Lu",
> >> + __entry->comm, __entry->pid,
> >> + __print_symbolic(__entry->policy, SCHEDULING_POLICY),
> >> + __entry->nice, __entry->rt_priority,
> >> + __entry->dl_runtime, __entry->dl_deadline,
> >> + __entry->dl_period)
> >> +);
> >> #endif /* _TRACE_SCHED_H */
> >>
> >> /* This part must be outside protection */
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index 7926993..ac4294a 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1773,6 +1773,7 @@ long _do_fork(unsigned long clone_flags,
> >> struct pid *pid;
> >>
> >> trace_sched_process_fork(current, p);
> >> + trace_sched_prio_update(p);

>From the change log:

"It is emitted in the code path of the sched_setscheduler,
sched_setattr, sched_setparam, nice and the fork system calls. For fork, it is emitted
after the sched_process_fork tracepoint for timeline consistency and
because the PID is not yet set when sched_fork() is called."

I'm not convinced this should be needed. I hate adding back to back
tracepoints.

-- Steve


> >> pid = get_task_pid(p, PIDTYPE_PID);
> >> nr = pid_vnr(pid);
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index ce83e39..c729425 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -3708,6 +3708,7 @@ void set_user_nice(struct task_struct *p, long nice)
> >> resched_curr(rq);
> >> }
> >> out_unlock:
> >> + trace_sched_prio_update(p);
> >> task_rq_unlock(rq, p, &rf);
> >> }
> >> EXPORT_SYMBOL(set_user_nice);
> >> @@ -3912,6 +3913,8 @@ static void __setscheduler(struct rq *rq, struct
> >> task_struct *p,
> >> p->sched_class = &rt_sched_class;
> >> else
> >> p->sched_class = &fair_sched_class;
> >> +
> >> + trace_sched_prio_update(p);
> >> }
> >>
> > > static void
>