Re: [PATCH] trace: reset sleep/block start time on task switch

From: Arun Sharma
Date: Mon Jan 23 2012 - 18:03:00 EST

Next message: Pradeep A. Dalvi: "Re: [PATCH/RESEND] drivers/net/ethernet: dev_alloc_skb to netdev_alloc_skb"
Previous message: Luck, Tony: "RE: x86, mce, Use user return notifier in mce"
In reply to: Peter Zijlstra: "Re: [PATCH] trace: reset sleep/block start time on task switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1/23/12 1:03 PM, Peter Zijlstra wrote:

This would limit the stores to the blocking case, your suggestion of
moving them to the same cacheline will then get us back where we started
in terms of performance.

Or did I miss something?

---
kernel/sched/fair.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 84adb2d..60f9ab9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1191,6 +1191,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (entity_is_task(se)) {
struct task_struct *tsk = task_of(se);

+ se->statistics.sleep_start = 0;
+ se->statistics.block_start = 0;
+

We might still need some additional logic to ignore sleep_start if the last context switch was a preemption. Test case Andrew Vagin posted on 12/21:

nanosleep();
s = time(NULL);
while (time(NULL) - s < 4);

During the busy wait while loop, sleep_start is non-zero and the first sample from sched_stat_sleeptime() and anyone else doing the (now - sleep_start) computation would get a bogus value until the next dequeue.

I can't think of an obvious way to pass an extra parameter (bool preempted) to the tracepoint. Would something like this be too
intrusive?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index df00cb0..b69bb9a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1908,7 +1908,8 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
* with the lock held can cause deadlocks; see schedule() for
* details.)
*/
-static void finish_task_switch(struct rq *rq, struct task_struct *prev)
+static void finish_task_switch(struct rq *rq, struct task_struct *prev,
+ bool preempted)
__releases(rq->lock)
{
struct mm_struct *mm = rq->prev_mm;
@@ -1997,7 +1998,7 @@ asmlinkage void schedule_tail(struct task_struct *prev)
{
struct rq *rq = this_rq();

- finish_task_switch(rq, prev);
+ finish_task_switch(rq, prev, false);

/*
* FIXME: do we need to worry about rq being invalidated by the
@@ -2019,7 +2020,7 @@ asmlinkage void schedule_tail(struct task_struct *prev)
*/
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
- struct task_struct *next)
+ struct task_struct *next, bool preempted)
{
struct mm_struct *mm, *oldmm;

@@ -2064,7 +2065,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
* CPUs since it called schedule(), thus the 'rq' on its stack
* frame will be invalid.
*/
- finish_task_switch(this_rq(), prev);
+ finish_task_switch(this_rq(), prev, preempted);
}

/*
@@ -3155,9 +3156,9 @@ pick_next_task(struct rq *rq)
static void __sched __schedule(void)
{
struct task_struct *prev, *next;
- unsigned long *switch_count;
struct rq *rq;
int cpu;
+ bool preempted;

need_resched:
preempt_disable();
@@ -3173,7 +3174,7 @@ need_resched:

raw_spin_lock_irq(&rq->lock);

- switch_count = &prev->nivcsw;
+ preempted = true;
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
@@ -3194,7 +3195,7 @@ need_resched:
try_to_wake_up_local(to_wakeup);
}
}
- switch_count = &prev->nvcsw;
+ preempted = false;
}

pre_schedule(rq, prev);
@@ -3210,9 +3211,12 @@ need_resched:
if (likely(prev != next)) {
rq->nr_switches++;
rq->curr = next;
- ++*switch_count;
+ if (preempted)
+ prev->nivcsw++;
+ else
+ prev->nvcsw++;

- context_switch(rq, prev, next); /* unlocks the rq */
+ context_switch(rq, prev, next, preempted); /* unlocks the rq */
/*
* The context switch have flipped the stack from under us
* and restored the local variables which were saved when

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Pradeep A. Dalvi: "Re: [PATCH/RESEND] drivers/net/ethernet: dev_alloc_skb to netdev_alloc_skb"
Previous message: Luck, Tony: "RE: x86, mce, Use user return notifier in mce"
In reply to: Peter Zijlstra: "Re: [PATCH] trace: reset sleep/block start time on task switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]