Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

From: Ingo Molnar
Date: Sun Apr 03 2005 - 10:02:59 EST



* Paul Jackson <pj@xxxxxxxxxxxx> wrote:

> Three more observations.
>
> 1) The slowest measure_one() calls are, not surprisingly, those for the
> largest sizes. At least on my test system of the moment, the plot
> of cost versus size has one major maximum (a one hump camel, not two).
>
> Seems like if we computed from smallest size upward, instead of largest
> downward, and stopped whenever two consecutive measurements were less
> than say 70% of the max seen so far, then we could save a nice chunk
> of the time.
>
> Of course, if two hump systems exist, this is not reliable on them.

yes, this is the approach i'm currently working on, but it's not
reliable yet. (one of the systems i have drifts its cost into infinity
after the hump, which shouldnt happen)

> 2) Trivial warning fix for printf format mismatch:

thx.

> 3) I was noticing that my test system was only showing a couple of
> distinct values for cpu_distance, even though it has 4 distinct
> distances for values of node_distance. So I coded up a variant of
> cpu_distance that converts the problem to a node_distance problem,
> and got the following cost matrix:
>
> =================================== begin ===================================
> Total of 8 processors activated (15515.64 BogoMIPS).
> ---------------------
> migration cost matrix (max_cache_size: 0, cpu: -1 MHz):
> ---------------------
> [00] [01] [02] [03] [04] [05] [06] [07]
> [00]: - 4.0(0) 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
> [01]: 4.0(0) - 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
> [02]: 21.7(1) 21.7(1) - 4.0(0) 25.3(3) 25.3(3) 25.2(2) 25.2(2)
> [03]: 21.7(1) 21.7(1) 4.0(0) - 25.3(3) 25.3(3) 25.2(2) 25.2(2)
> [04]: 25.2(2) 25.2(2) 25.3(3) 25.3(3) - 4.0(0) 21.7(1) 21.7(1)
> [05]: 25.2(2) 25.2(2) 25.3(3) 25.3(3) 4.0(0) - 21.7(1) 21.7(1)
> [06]: 25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1) - 4.0(0)
> [07]: 25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1) 4.0(0) -
> ---------------------
> cacheflush times [4]: 4.0 (4080540) 21.7 (21781380) 25.2 (25259428) 25.3 (25372682)

i'll first try the bottom-up approach to speed up detection (getting to
the hump is very fast most of the time). The hard part was to create a
workload that generates the hump reliably on a number of boxes - i'm
happy it works on ia64 too.

then we can let the arch override the cpu_distance() method, although i
do think that _if_ there is a significant hierarchy between CPUs it
should be represented via a matching sched-domains hierarchy, and the
full hierarchy should be tuned accordingly.

btw., the migration cost matrix we can later use to tune all the other
sched-domains balancing related tunables as well - cache_hot_time is
just the first obvious step. (which also happens to make the most
difference.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/