Re: [PATCH] trace: reset sleep/block start time on task switch

From: Peter Zijlstra
Date: Mon Jan 23 2012 - 06:34:30 EST


On Thu, 2012-01-19 at 18:20 -0800, Arun Sharma wrote:
> Without this patch, the first sample we get on a
> task might be bad because of a stale sleep_start
> value that wasn't reset at the last task switch
> because the tracepoint was not active.
>
> The problem can be worked around via perf record
> --filter "sleeptime < some-large-number" in practice
> and it's not clear if the added code to the context
> switch path is worth it.
>
> I'm posting this patch regardless, just in case
> more people start noticing this and start wondering
> where the bogus numbers came from.
>
> Signed-off-by: Arun Sharma <asharma@xxxxxx>
> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxxxxx>
> Cc: Andrew Vagin <avagin@xxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
> include/trace/events/sched.h | 3 ---
> kernel/sched/core.c | 3 +++
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 6ba596b..814cdf1 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -378,9 +378,6 @@ static inline u64 trace_get_sleeptime(struct task_struct *tsk)
>
> block = tsk->se.statistics.block_start;
> sleep = tsk->se.statistics.sleep_start;
> - tsk->se.statistics.block_start = 0;
> - tsk->se.statistics.sleep_start = 0;
> -
> return block ? block : sleep ? sleep : 0;
> #else
> return 0;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 457c881..6349cee 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1937,7 +1937,10 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
> local_irq_enable();
> #endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
> finish_lock_switch(rq, prev);
> +
> trace_sched_stat_sleeptime(current, rq->clock);
> + current->se.statistics.block_start = 0;
> + current->se.statistics.sleep_start = 0;
>
> fire_sched_in_preempt_notifiers(current);
> if (mm)


Its not just your tracepoint data being wrong, it'll wreck all related
stats :/

This'll fail to compile for !CONFIG_SCHEDSTAT I guess.. I should have
paid more attention to the initial patch, that tracepoint having
side-effects is a big no-no.

Having unconditional writes there is somewhat sad, but I suspect putting
a conditional around it isn't going to help much.. bah can we
restructure things so we don't need this?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/