Re: [RFC] sched: CPU topology try

From: Preeti U Murthy
Date: Tue Jan 07 2014 - 04:44:01 EST


Hi Vincent, Peter,

On 12/18/2013 06:43 PM, Vincent Guittot wrote:
> This patch applies on top of the two patches [1][2] that have been proposed by
> Peter for creating a new way to initialize sched_domain. It includes some minor
> compilation fixes and a trial of using this new method on ARM platform.
> [1] https://lkml.org/lkml/2013/11/5/239
> [2] https://lkml.org/lkml/2013/11/5/449
>
> Based on the results of this tests, my feeling about this new way to init the
> sched_domain is a bit mitigated.
>
> The good point is that I have been able to create the same sched_domain
> topologies than before and even more complex ones (where a subset of the cores
> in a cluster share their powergating capabilities). I have described various
> topology results below.
>
> I use a system that is made of a dual cluster of quad cores with hyperthreading
> for my examples.
>
> If one cluster (0-7) can powergate its cores independantly but not the other
> cluster (8-15) we have the following topology, which is equal to what I had
> previously:
>
> CPU0:
> domain 0: span 0-1 level: SMT
> flags: SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 0 1
> domain 1: span 0-7 level: MC
> flags: SD_SHARE_PKG_RESOURCES
> groups: 0-1 2-3 4-5 6-7
> domain 2: span 0-15 level: CPU
> flags:
> groups: 0-7 8-15
>
> CPU8
> domain 0: span 8-9 level: SMT
> flags: SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 8 9
> domain 1: span 8-15 level: MC
> flags: SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 8-9 10-11 12-13 14-15
> domain 2: span 0-15 level CPU
> flags:
> groups: 8-15 0-7
>
> We can even describe some more complex topologies if a susbset (2-7) of the
> cluster can't powergate independatly:
>
> CPU0:
> domain 0: span 0-1 level: SMT
> flags: SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 0 1
> domain 1: span 0-7 level: MC
> flags: SD_SHARE_PKG_RESOURCES
> groups: 0-1 2-7
> domain 2: span 0-15 level: CPU
> flags:
> groups: 0-7 8-15
>
> CPU2:
> domain 0: span 2-3 level: SMT
> flags: SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 0 1
> domain 1: span 2-7 level: MC
> flags: SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
> groups: 2-7 4-5 6-7
> domain 2: span 0-7 level: MC
> flags: SD_SHARE_PKG_RESOURCES
> groups: 2-7 0-1
> domain 3: span 0-15 level: CPU
> flags:
> groups: 0-7 8-15
>
> In this case, we have an aditionnal sched_domain MC level for this subset (2-7)
> of cores so we can trigger some load balance in this subset before doing that
> on the complete cluster (which is the last level of cache in my example)
>
> We can add more levels that will describe other dependency/independency like
> the frequency scaling dependency and as a result the final sched_domain
> topology will have additional levels (if they have not been removed during
> the degenerate sequence)
>
> My concern is about the configuration of the table that is used to create the
> sched_domain. Some levels are "duplicated" with different flags configuration
> which make the table not easily readable and we must also take care of the
> order because parents have to gather all cpus of its childs. So we must
> choose which capabilities will be a subset of the other one. The order is
> almost straight forward when we describe 1 or 2 kind of capabilities
> (package ressource sharing and power sharing) but it can become complex if we
> want to add more.

What if we want to add arch specific flags to the NUMA domain? Currently
with Peter's patch:https://lkml.org/lkml/2013/11/5/239 and this patch,
the arch can modify the sd flags of the topology levels till just before
the NUMA domain. In sd_init_numa(), the flags for the NUMA domain get
initialized. We need to perhaps call into arch here to probe for
additional flags?

Thanks

Regards
Preeti U Murthy
>
> Regards
> Vincent
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/