Re: SD_SHARE_CPUPOWER breaks scheduler fairness

From: Con Kolivas
Date: Wed Jun 01 2005 - 09:49:45 EST


On Thu, 2 Jun 2005 00:29, Steve Rotolo wrote:
> On Tue, 2005-05-31 at 22:49, Con Kolivas wrote:
> > Sort of yes and yes. The idea that the sibling gets put to sleep if a
> > real time task is running is a workaround for the fact that you do share
> > cpu power (as you've correctly understood) and a real time task will slow
> > down if a SCHED_NORMAL task is running on its sibling which it should
> > not. The limitation is that, yes, for all intents you only have N
> > hyperthreaded cpus for spinning N rt tasks before nothing else runs, but
> > you can actually run N*2 rt tasks in this setting which you would not be
> > able to if hyperthreading was disabled.
> >
> > For some time I've been thinking about changing the balance between the
> > siblings slightly to allow SCHED_NORMAL tasks to run a small proportion
> > of time when rt tasks are running on the sibling. The tricky part is that
> > SCHED_FIFO tasks have no timeslice so we can't proportion cpu out
> > according to the difference in size of the timeslices, which is currently
> > how we proportion out cpu across siblings with SCHED_NORMAL, and this
> > maintains cpu distribution very similarly to how 'nice' does on the same
> > cpu.
>
> Thanks for responding, Con. But I want to make sure that an important
> point doesn't escape your attention. It appears that tasks get trapped
> on the stalled sibling, even when they could run on some other cpu. The
> load-balancer does not understand that the sibling is temporarily out of
> service so it actually balances tasks to it. And since it's idle, it
> may attract tasks to it more than other cpus (thanks to SD_WAKE_IDLE).
> I think this is a serious bug.

I didn't miss the point, but I guess I should have made that clear too.

The number of tasks seen running on that sibling is still the same even if the
queue is forced to be idle (witness by top thinking the load is 1 on that
sibling even if it also shows quite a lot of idle time). It should therefore
not attract any more tasks to itself.

The task that is there will be trapped based on the fact that there is only
one task _only_ if the other sibling is indefinitely running real time tasks,
and _if_ there are other physical cpus we can use we should try to schedule
the trapped task away. If we have N physical cpus (and N*2 logical), and we
are running N real time threads I don't think we should expect to run
SCHED_NORMAL tasks as well. If we have <N real time tasks (where N > 1) then
we should still be able to run SCHED_NORMAL tasks, I agree. I'm a little
reluctant to tackle this at this stage with the number of SMP balancing
things already queued for -mm, but making a sibling appear more heavily laden
when "pegged" (nr_running + 1) should suffice.

Cheers,
Con

Attachment: pgp00000.pgp
Description: PGP signature