Re: [PATCH/RFC 0/5] sched: add new 'book' scheduling domain

From: Andreas Herrmann
Date: Thu Aug 19 2010 - 09:07:10 EST


On Thu, Aug 12, 2010 at 01:25:44PM -0400, Heiko Carstens wrote:
> This patch set adds (yet) another scheduling domain to the scheduler.

All that stuff reminds me of quite similar patches to introduce a
multi-node scheduling domain for Magny-Cours CPUs.

I am afraid that this stuff won't make it upstream and we both have to
review Peter's suggestions from last year to come up with a more
genarelized/flexible way to handle different scheduling domains.


> The reason for this is that the recent (s390) z196 architecture has
> four cache levels and uniform memory access (sort of -- see below).
> The cpu/cache/memory hierarchy is as follows:

> Each cpu has its private L1 (64KB I-cache + 128KB D-cache) and L2 (1.5MB)
> cache.
> A core consists of four cpus with a 24MB shared L3 cache.
> A book consists of six cores with a 192MB shared L4 cache.

> The z196 architecture has no SMT.

[...]

> A boot of a logical partition with 20 cpus, shared on two books, gives these
> initializion output to the console:

Below output shows that there is some odd distribution of your CPUs in
the different domain levels. Is this caused by the fact that not all
CPUs of a core and book were assigned to your logical partition?

For better understanding is the following CPUs-to-core/book mapping correct for
your example?

Book | Core | CPU
------+--------+---------
0 | 0 | 0,1,2,3
0 | 1 | 4,5
1 | 0 | 6,9
1 | 1 | 10,11
1 | 2 | 12,13
1 | 3 | 14,15,16
1 | 4 | 17,18,19

> Brought up 20 CPUs
> CPU0 attaching sched-domain:
> domain 0: span 0-5 level BOOK
> groups: 0 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048)

Why isn't there a range 0-3 instead of "0 1-3"?
And why isn't cpu_power=4096?
Ah, I think that for CPU 0 just the power information is
missing, So we have 3 groups:

0 (cpu_power=1024) 1-3 (cpu_power=3071) 4-5 (cpu_power=2048)

And the MC level is folded because it doesn't add anything in this
case.

So the mapping is in fact

Book | Core | CPU
------+--------+---------
0 | 0 | 0
0 | 1 | 1,2,3
0 | 2 | 4,5
1 | 0 | 6,9
1 | 1 | 10,11
1 | 2 | 12,13
1 | 3 | 14,15,16
1 | 4 | 17,18,19


> domain 1: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
> CPU1 attaching sched-domain:
> domain 0: span 1-3 level MC
> groups: 1 2 3
> domain 1: span 0-5 level BOOK
> groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
> domain 2: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)

It's odd that for CPU 1 the BOOK domain groups differ from those shown
for CPU0.

> CPU2 attaching sched-domain:
> domain 0: span 1-3 level MC
> groups: 2 3 1
> domain 1: span 0-5 level BOOK
> groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0

Again for CPU 0 the cpu_power is missing. I think that is confusing.
For better readability that sould also be displayed (if a group
consists of only 1 CPU).

> domain 2: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)

[snip the rest]



Andreas

--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/