[tip:numa/core] numa: Fix task_work() double add

From: tip-bot for Ingo Molnar
Date: Thu Oct 18 2012 - 13:09:39 EST


Commit-ID: e99e955b90d7aaa4418cace9734e8069921ba440
Gitweb: http://git.kernel.org/tip/e99e955b90d7aaa4418cace9734e8069921ba440
Author: Ingo Molnar <mingo@xxxxxxx>
AuthorDate: Mon, 15 Oct 2012 07:46:28 +0200
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Thu, 18 Oct 2012 08:08:24 +0200

numa: Fix task_work() double add

task_work() must not be double added - it can result in an infinite
loop in run_task_work().

So separate out p->numa_work and signal an 'empty' work callback with
numa_work->next == &numa_work. (NULL cannot be used as the task work
code uses it.)

It would be handy if kerne/task_work.c had something like this in
run_task_work():

if (WARN_ON_ONCE(work->next == work))
break;

As it took some time to figure out why it was sporadically hanging.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Link: http://lkml.kernel.org/n/tip-gT2keodfehhzwmx0appafeUy@xxxxxxxxxxxxxx
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 16 ++++++----------
3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c86db44..7882686 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1529,6 +1529,7 @@ struct task_struct {
u64 node_stamp; /* migration stamp */
unsigned long numa_contrib;
unsigned long *numa_faults;
+ struct callback_head numa_work;
#endif /* CONFIG_SCHED_NUMA */

struct rcu_head rcu;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 23ad8b9..08661fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1546,6 +1546,7 @@ static void __sched_fork(struct task_struct *p)
p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
p->numa_faults = NULL;
p->numa_task_period = sysctl_sched_numa_task_period_min;
+ p->numa_work.next = &p->numa_work;
#endif /* CONFIG_SCHED_NUMA */
}

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 530448c..1e24aa1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -900,8 +900,9 @@ void task_numa_work(struct callback_head *work)
struct task_struct *p = current;
struct mm_struct *mm = p->mm;

- WARN_ON_ONCE(p != container_of(work, struct task_struct, rcu));
+ WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));

+ work->next = work; /* protect against double add */
/*
* Who cares about NUMA placement when they're dying.
*
@@ -933,12 +934,13 @@ void task_numa_work(struct callback_head *work)
*/
void task_tick_numa(struct rq *rq, struct task_struct *curr)
{
+ struct callback_head *work = &curr->numa_work;
u64 period, now;

/*
* We don't care about NUMA placement if we don't have memory.
*/
- if (!curr->mm)
+ if (!curr->mm || (curr->flags & PF_EXITING) || work->next != work)
return;

/*
@@ -954,14 +956,8 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr)
curr->node_stamp = now;

if (!time_before(jiffies, curr->mm->numa_next_scan)) {
- /*
- * We can re-use curr->rcu because we checked curr->mm
- * != NULL so release_task()->call_rcu() was not called
- * yet and exit_task_work() is called before
- * exit_notify().
- */
- init_task_work(&curr->rcu, task_numa_work);
- task_work_add(curr, &curr->rcu, true);
+ init_task_work(work, task_numa_work); /* TODO: move this into sched_fork() */
+ task_work_add(curr, work, true);
}
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/