Re: [PATCH 15/15] x86: Fix cpu_coregroup_mask to return correctcpumask on multi-node processors

From: Andreas Herrmann
Date: Thu Aug 27 2009 - 09:19:38 EST


On Tue, Aug 25, 2009 at 12:36:51PM +0200, Ingo Molnar wrote:
>
> * Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:
>
> > On Mon, Aug 24, 2009 at 08:21:54PM +0200, Ingo Molnar wrote:
> > >
> > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > > On Thu, 2009-08-20 at 15:46 +0200, Andreas Herrmann wrote:
> > > > > The correct mask that describes core-siblings of an processor
> > > > > is topology_core_cpumask. See topology adapation patches, especially
> > > > > http://marc.info/?l=linux-kernel&m=124964999608179
> > > >
> > > > argh, violence, murder kill.. this is the worst possible hack and
> > > > you're extending it :/
> > >
> > > I think most of the trouble here comes from having inconsistent
> > > names, a rather static structure for sched-domains setup and
> > > then we are confusing things back and forth.
> > >
> > > Right now we have thread/sibling, core, CPU/socket and node,
> > > with many data structures around these hardcoded. Certain
> > > scheduler features only operate on the hardcoded fields.
> > >
> > > Now Magny-Cours adds a socket internal node construct to the
> > > whole thing, names it randomly and basically breaks the
> > > semi-static representation.
> > >
> > > We cannot just flip around our static names and hope it goes
> > > well and everything just drops into place. Everything just falls
> > > apart really instead.
> > >
> > > Instead we should have an arch-defined tree and a CPU
> > > architecture dependent ASCII name associated with each level -
> > > but not hardcoded into the scheduler.
> >
> > I admit that it's strange to have the x86 specific SCHED_SMT/MC
> > snippets in common code.
> >
> > And the NUMA/SD_NODE stuff is not used by all architectures
> > either.
> >
> > Having an arch-defined tree seems the right thing to do.
>
> yep, with generic helpers to reduce per arch bloat.
> (named/structured in a neutral way)
>
> > > Plus we should have independent scheduler domains feature flags
> > > that can be turned on/off in various levels of that tree,
> > > depending on the cache and interconnect properties of the
> > > hardware - without having to worry about what the ASCII name
> > > says. Those features should be capable to work not just on the
> > > lowest level of the tree, but on higher levels too, regardless
> > > whether that level is called a 'core', a 'socket' or an
> > > 'internal node' on the ASCII level really.
> > >
> > > This is why i insisted on handling the Magny-Cours topology
> > > discovery and enumeration patches together with the scheduler
> > > patches. It can easily become a mess if extended.
> >
> > I don't buy this argument.
> >
> > The main source of information when building sched-domains will be
> > the CPU topology. That must be provided somehow independent of how
> > scheduling domains are created. When the domains are built you
> > just need to know which cpumask to use when the sched_groups and
> > domain's span are determined.
> >
> > Thus I think the topology detection is rather self-contained and
> > can/should be provided independent of how the scheduler side is
> > going to be implemented.
>
> This is the sysfs bits?

So you "only" object to the sysfs topology additions, correct?

> What is this needed for exactly?

If you need to know which cores share the same northbridge or in
a general sense which cores are on the same die.
That directly leads to the question whether a more generic
nomenclature should be used: chip_siblings instead of
cpu_node_siblings (could cover all MCM processors).

When a user wants to pin tasks to dedicated CPUs he might need this
information.
Maybe even you like to count northbridge events with PCL and have to
know which CPUs share the same northbridge and where you have to bind
the tasks/threads that you wanna monitor?

> The scheduler is pretty much the most important thing to tune in a
> topology aware manner, besides memory allocations.

I can leave out the patches that introduce the interface. But I
really want to have a cpu_node_map for a CPU and the cpu_node_id in
cpuinfo_x86 plus the two fixes (for L3 cache and MCE).

Instead of using new sysfs topology attributes the user can also
gather the node information from the shared_cpu_map of the L3
cache. That's not as straightforward as keeping all topology
information in one place but I can live with that.


Regards,
Andreas

--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/