Re: Bug in scheduler when using rt_mutex

From: Mike Galbraith
Date: Tue Jan 18 2011 - 22:44:07 EST


On Wed, 2011-01-19 at 10:38 +0800, Yong Zhang wrote:

> > Index: linux-2.6/kernel/sched_fair.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched_fair.c
> > +++ linux-2.6/kernel/sched_fair.c
> > @@ -4075,6 +4075,22 @@ static void prio_changed_fair(struct rq
> > static void switched_to_fair(struct rq *rq, struct task_struct *p,
> > int running)
> > {
> > + struct sched_entity *se = &p->se;
> > + struct cfs_rq *cfs_rq = cfs_rq_of(se);
> > +
> > + if (se->on_rq && cfs_rq->curr != se)
>
> (cfs_rq->curr != se) equals to (!running), no?

No, running is task_of(se) == rq->curr. Another class or fair group
task may be rq_of(cfs_rq)->curr

> > + __dequeue_entity(cfs_rq, se);
> > +
> > + /*
> > + * se->vruntime can be completely out there, there is no telling
> > + * how long this task was !fair and on what CPU if any it became
> > + * !fair. Therefore, reset it to a known, reasonable value.
> > + */
> > + se->vruntime = cfs_rq->min_vruntime;
>
> But this is not fair for !SLEEP task.
> You know se->vruntime -= cfs_rq->min_vruntime for !SLEEP task,
> then after it go through sched_fair-->sched_rt-->sched_fair by some
> means, current cfs_rq->min_vruntime is added back.

It drops lag for all, positive or negative.

> But here se is putted before where it should be. Is this what we want?

It may move forward or backward. If transitions can happen at high
frequency it could be a problem, otherwise, it's a cornercase blip.

An alternative is to leave lag alone. and normalize sleepers, but that's
(did that) considerably more intrusive.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/