Re: RE:[PATCH] sched: Add trace for task wake up latency and leave running time

From: peterz
Date: Thu Sep 03 2020 - 03:43:01 EST


On Wed, Sep 02, 2020 at 10:35:34PM +0000, gengdongjiu wrote:

> > NAK, that tracepoint is already broken, we don't want to proliferate the broken.
>
> Sorry, What the meaning that tracepoint is already broken?

Just that, the tracepoint is crap. But we can't fix it because ABI. Did
I tell you I utterly hate tracepoints?

> Maybe I need to explain the reason that why I add two trace point.
> when using perf tool or Ftrace sysfs to capture the task wake-up latency and the task leaving running queue time, usually the trace data is too large and the CPU utilization rate is too high in the process due to a lot of disk write. Sometimes even the disk is full, the issue still does not reproduced that above two time exceed a certain threshold. So I added two trace points, using filter we can only record the abnormal trace that includes wakeup latency and leaving running time larger than an threshold.
> Or do you have better solution?

Learn to use a MUA and wrap your lines at 78 chars like normal people.

Yes, use ftrace synthetic events, or bpf or really anything other than
this.

> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c index
> > > 8471a0f7eb32..b5a1928dc948 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -2464,6 +2464,8 @@ static void ttwu_do_wakeup(struct rq *rq, struct
> > > task_struct *p, int wake_flags, {
> > > check_preempt_curr(rq, p, wake_flags);
> > > p->state = TASK_RUNNING;
> > > + p->ts_wakeup = local_clock();
> > > + p->wakeup_state = true;
> > > trace_sched_wakeup(p);
> > >
> > > #ifdef CONFIG_SMP
> >
> > NAK, userless overhead.
>
> When sched switch, we do not know the next task previous state and
> wakeup timestamp, so I record the task previous state if it is waken
> from sleep. And then it can calculate the wakeup latency when task
> switch.

I don't care. You're making things slower.