Re: [PATCH 5/5] sched,futex: Provide delayed wakeup list

From: Peter Zijlstra
Date: Sat Nov 23 2013 - 07:01:38 EST


> I used to have a patch to schedule() that would always immediately fall
> through and only actually block on the second call; it illustrated the
> problem really well, in fact so well the kernels fails to boot most
> times.

I found the below on my filesystem -- making it apply shouldn't be hard.
Making it work is the same effort as that patch you sent, we need to
guarantee all schedule() callers can deal with not actually sleeping --
aka. spurious wakeups.

I don't think anybody ever got that thing to run reliable enough to see
if the idea proposed in the patch made any difference to actual
workloads though.

---
Subject:
From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Thu Dec 09 17:51:09 CET 2010


Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Link: http://lkml.kernel.org/n/tip-v17vshx6uasjguuwd67fe7tg@xxxxxxxxxxxxxx
---
include/linux/sched.h | 5 +++--
kernel/sched/core.c | 18 ++++++++++++++++++
2 files changed, 21 insertions(+), 2 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -153,9 +153,10 @@ print_cfs_rq(struct seq_file *m, int cpu
#define TASK_DEAD 64
#define TASK_WAKEKILL 128
#define TASK_WAKING 256
-#define TASK_STATE_MAX 512
+#define TASK_YIELD 512
+#define TASK_STATE_MAX 1024

-#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKW"
+#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWY"

extern char ___assert_task_state[1 - 2*!!(
sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -931,6 +931,7 @@ void set_task_cpu(struct task_struct *p,
* ttwu() will sort out the placement.
*/
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
+ !(p->state & TASK_YIELD) &&
!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));

#ifdef CONFIG_LOCKDEP
@@ -2864,6 +2865,22 @@ static void __sched __schedule(void)
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
+ /*
+ * Provide an auto-yield feature on schedule().
+ *
+ * The thought is to avoid a sleep+wakeup cycle
+ * if simply yielding the cpu will suffice to
+ * satisfy the required condition.
+ *
+ * Assumes the calling schedule() site can deal
+ * with spurious wakeups.
+ */
+ if (prev->state & TASK_YIELD) {
+ prev->state &= ~TASK_YIELD;
+ if (rq->nr_running > 1)
+ goto no_deactivate;
+ }
+
deactivate_task(rq, prev, DEQUEUE_SLEEP);
prev->on_rq = 0;

@@ -2880,6 +2897,7 @@ static void __sched __schedule(void)
try_to_wake_up_local(to_wakeup);
}
}
+no_deactivate:
switch_count = &prev->nvcsw;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/