Re: [patch] fix SMT scheduler latency bug

From: Con Kolivas
Date: Wed Jun 22 2005 - 19:06:41 EST

On Thu, 23 Jun 2005 09:32, Ingo Molnar wrote:
> * Con Kolivas <kernel@xxxxxxxxxxx> wrote:
> > > task_timeslice(p) is indeed constant over time, but
> > > smt_curr->time_slice is not. So this condition opens up the possibility
> > > of a lower prio thread accumulating a larger ->time_slice and thus
> > > reversing the priority equation.
> >
> > I'm not clear on how the value of ->time_slice can ever grow to larger
> > than task_timeslice(p). It starts at task_timeslice(p) and decrements
> > till it gets to 0 when it refills again.
> I was thinking abut sched_exit(), there we let unused child timeslices
> 'flow back' into the parent thread, if the child thread was shortlived.
> The check there does:
> if (p->first_time_slice) {
> p->parent->time_slice += p->time_slice;
> if (unlikely(p->parent->time_slice > task_timeslice(p)))
> p->parent->time_slice = task_timeslice(p);
> }
> notice that we check parent->time_slice against the child's
> task_timeslice(p), not against task_timeslice(p->parent). So if the
> child thread got reniced, it could cause a higher-than-normal amount of
> timeslices. But this should be a rare scenario, and the above code is
> more of a bug than a feature (will send a patch for it tomorrow), and it
> should not affect the workloads i was testing.


> lets take a look at the second condition again:
> if ((p->time_slice * (100 - sd->per_cpu_gain) / 100) >
> task_timeslice(smt_curr))
> resched_task(smt_curr);
> if this condition is true then we trigger a preemption at smt_curr. Now
> in the bug scenario, 'p' is a highprio task and smt_curr is a lowprio
> task. If p->time_slice (which fluctuates between task_timeslice(p) and
> 0) happens to be low enough, preemption wont be triggered and we lose a
> wakeup in essence - 'p', despite being the highest-prio task around,
> wont be run until some CPU runs schedule() voluntarily. Ok?

In dependent_sleeper() we return 1 only to prevent p from scheduling. This
second condition does not return 1 from dependent_sleeper() so p will still
go ahead and schedule. This second condition only affects the scheduling on
the smt sibling.

About the only scenario I can envision a high priority task being delayed with
the code as it currently is in 2.6.12-mm1 is with a high priority task being
on the expired array and a low priority task being delayed on the active
array. This still should not create large latencies unless array swapping is
significantly delayed. I considered adding a check for this originally but it
seemed to be unnecessary extra complexity since an expired task by design is
expected to be delayed more anyway.



Attachment: pgp00000.pgp
Description: PGP signature