Re: [PATCH] trace: reset sleep/block start time on task switch

From: Arun Sharma
Date: Mon Jan 23 2012 - 13:41:33 EST


On 1/23/12 3:34 AM, Peter Zijlstra wrote:

Its not just your tracepoint data being wrong, it'll wreck all related
stats :/

I see: these stats are also used in sched_debug.c.


This'll fail to compile for !CONFIG_SCHEDSTAT I guess.. I should have
paid more attention to the initial patch, that tracepoint having
side-effects is a big no-no.

Having unconditional writes there is somewhat sad, but I suspect putting
a conditional around it isn't going to help much..

For performance reasons?


bah can we
restructure things so we don't need this?


We can go back to the old code, where these values were getting reset in {en,de}queue_sleeper(). But we'll have to do it conditionally, so the values are preserved till context switch time when we need it there.

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1003,6 +1003,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
if (unlikely(delta > se->statistics.sleep_max))
se->statistics.sleep_max = delta;

+ if (!trace_sched_stat_sleeptime_enabled())
+ se->statistics.sleep_start = 0;
se->statistics.sum_sleep_runtime += delta;

if (tsk) {
@@ -1019,6 +1021,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
if (unlikely(delta > se->statistics.block_max))
se->statistics.block_max = delta;

+ if (!trace_sched_stat_sleeptime_enabled())
+ se->statistics.block_start = 0;
se->statistics.sum_sleep_runtime += delta;

if (tsk) {

This looks pretty ugly too, I don't know how to check for a tracepoint being active (Steven?). The only advantage of this approach is that it's in the sleep/wakeup path, rather than the context switch path.

Conceptually, the following seems to be the simplest:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1939,8 +1939,10 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
finish_lock_switch(rq, prev);

trace_sched_stat_sleeptime(current, rq->clock);
+#ifdef CONFIG_SCHEDSTAT
current->se.statistics.block_start = 0;
current->se.statistics.sleep_start = 0;
+#endif /* CONFIG_SCHEDSTAT */

Perhaps we can reorder fields in sched_statistics so we touch one cacheline here instead of two?

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/