Re: Linux 2.4.17-pre5

From: Davide Libenzi (davidel@xmailserver.org)
Date: Sun Dec 09 2001 - 14:48:53 EST

Next message: Zlatko Calusic: "Re: ext3 writeback mode slower than ordered mode?"
Previous message: Samium Gromoff: "Re: 2.4.12-ac4 10Mbit NE2k interrupt load kills p166"
In reply to: Alan Cox: "Re: Linux 2.4.17-pre5"
Next in thread: Mike Kravetz: "Re: Linux 2.4.17-pre5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 9 Dec 2001, Alan Cox wrote:

> > Using the scheduler i'm working on and setting a trigger load level of 2,
> > as soon as the idle is scheduled it'll go to grab the task waiting on the
> > other cpu and it'll make it running.
>
> That rapidly gets you thrashing around as I suspect you've found.

Not really because i can make the same choices inside the idle code, out
of he fast path, without slowing the currently running cpu ( the waker ).

> I'm currently using the following rule in wake up
>
> if(current->mm->runnable > 0) /* One already running ? */
> cpu = current->mm->last_cpu;
> else
> cpu = idle_cpu();
> else
> cpu = cpu_num[fast_fl1(runnable_set)]
>
> that is
> If we are running threads with this mm on a cpu throw them at the
> same core
> If there is an idle CPU use it
> Take the mask of currently executing priority levels, find the last
> set bit (lowest pri) being executed, and look up a cpu running at
> that priority
>
> Then the idle stealing code will do the rest of the balancing, but at least
> it converges towards each mm living on one cpu core.

I've done a lot of experiments balancing the cost of moving tasks with
related tlb flushes and cache image trashing, with the cost of actually
leaving a cpu idle for a given period of time.
For example in a dual cpu the cost of leaving an idle cpu for more than
40-50 ms is higher than immediately fill the idle with a stolen task (
trigger rq length == 2 ).
This picture should vary a lot with big SMP systems, that's why i'm
seeking at a biased solution where it's easy to adjust the scheduler
behavior based on the underlying architecture.
For example, by leaving balancing decisions inside the idle code we'll
have a bit more time to consider different moving costs/metrics than will
be present for example in NUMA machines.
By measuring the cost of moving with the cpu idle time we'll have a pretty
good granularity and we could say, for example, that the tolerable cost of
moving a task on a given architecture is 40 ms idle time.
This means that if during 4 consecutive timer ticks ( on 100 HZ archs )
the idle cpu has found an "unbalanced" system, it's allowed to steal a
task to run on it.
Or better, it's allowed to steal a task from a cpu set that has a
"distance" <= 40 ms from its own set.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Zlatko Calusic: "Re: ext3 writeback mode slower than ordered mode?"
Previous message: Samium Gromoff: "Re: 2.4.12-ac4 10Mbit NE2k interrupt load kills p166"
In reply to: Alan Cox: "Re: Linux 2.4.17-pre5"
Next in thread: Mike Kravetz: "Re: Linux 2.4.17-pre5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:15 EST