[PATCH 2/2] sched: Prevent removal of leaf CFS runqueues with on_list children

From: Jan H. SchÃnherr
Date: Mon Jul 18 2011 - 06:52:28 EST

From: Jan H. SchÃnherr <schnhrr@xxxxxxxxxxxxxxx>

Currently there are (at least) two situations where a parent gets removed
from the list of leaf CFS runqueues although some of its children are
still on the list.

This patch adds a counter for children depending on a parent, preventing
the parent from being removed too early.

Consider three task groups with these parent pointers:
tg1 --> tg2 --> tg3

Situation 1:

1. Enqueue task A in tg1
2. Dequeue task A
3. Enqueue task B in tg2

In step 1 all three runqueues are added as leaves.

In step 2 the primary reason for being on the list vanishes, but all
three runqueues are kept on the list.

In step 3 we call enqueue_entity() for tg2. There, we call
update_cfs_load() before increasing nr_running. Therefore tg2 can be
removed from the leaf list. Note that enqueue_entity() will re-add tg2
at the end, but this will mix up the order as we do not provide tg1
as prev_cfs_rq.

Situation 2:

There is no guarantee that cfs_rq->load_avg of tg1 drops to zero before
that of tg2. Hence, tg2 can be removed before tg1. If something is
enqueued in tg2 after its removal, it would be added at the wrong
position. Enqueuing something in tg1 before it is removed would further
prevent the situation from sorting out itself.

Signed-off-by: Jan H. SchÃnherr <schnhrr@xxxxxxxxxxxxxxx>
kernel/sched.c | 1 +
kernel/sched_fair.c | 10 +++++++++-
2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..88c83b8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -344,6 +344,7 @@ struct cfs_rq {
* list is used during load balance.
int on_list;
+ int children_on_list;
struct list_head leaf_cfs_rq_list;
struct task_group *tg; /* group that "owns" this runqueue */

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index d021c75..1c7dd1d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -166,14 +166,22 @@ static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq,
list_add_rcu(&cfs_rq->leaf_cfs_rq_list, prev_leaf_cfs_rq);

cfs_rq->on_list = 1;
+ if (cfs_rq->tg->parent) {
+ int cpu = cpu_of(rq_of(cfs_rq));
+ cfs_rq->tg->parent->cfs_rq[cpu]->children_on_list++;
+ }

static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
- if (cfs_rq->on_list) {
+ if (cfs_rq->on_list && !cfs_rq->children_on_list) {
cfs_rq->on_list = 0;
+ if (cfs_rq->tg->parent) {
+ int cpu = cpu_of(rq_of(cfs_rq));
+ cfs_rq->tg->parent->cfs_rq[cpu]->children_on_list--;
+ }


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/