Re: [RFC][PATCH v5 01/14] sched: add a new arch_sd_local_flags forsched_domain init

From: Morten Rasmussen
Date: Thu Nov 14 2013 - 05:51:13 EST


On Wed, Nov 13, 2013 at 04:29:19PM +0000, Peter Zijlstra wrote:
> On Wed, Nov 13, 2013 at 03:47:16PM +0000, Dietmar Eggemann wrote:
> > On 12/11/13 18:08, Peter Zijlstra wrote:
> > > On Tue, Nov 12, 2013 at 05:43:36PM +0000, Dietmar Eggemann wrote:
> > >> This patch removes the sched_domain initializer macros
> > >> SD_[SIBLING|MC|BOOK|CPU]_INIT in core.c and in archs and replaces them
> > >> with calls to the new function sd_init(). The function sd_init
> > >> incorporates the already existing function sd_numa_init().
> > >
> > > Your patch retains far too much of the weird behavioural variations we
> > > have, nor does it create a proper separation between topology and
> > > behaviour.
> >
> > Could you please explain a little bit further on the weird behavioural
> > variations. Are you referring to the specific SD_ flags or sd_domain levels?
>
> +++ b/arch/ia64/kernel/topology.c
> @@ -99,6 +99,14 @@ out:
>
> subsys_initcall(topology_init);
>
> +void arch_sd_customize(int level, struct sched_domain *sd, int cpu)
> +{
> + if (level == SD_LVL_CPU) {
> + sd->cache_nice_tries = 2;
> +
> + sd->flags &= ~SD_PREFER_SIBLING;
> + }
> +}
>
> +++ b/arch/tile/kernel/smp.c
> @@ -254,3 +254,15 @@ void smp_send_reschedule(int cpu)
> }
>
> #endif /* CHIP_HAS_IPI() */
> +
> +void arch_sd_customize(int level, struct sched_domain *sd, int cpu)
> +{
> + if (level == SD_LVL_CPU) {
> + sd->min_interval = 4;
> + sd->max_interval = 128;
> +
> + sd->flags &= ~(SD_WAKE_AFFINE | SD_PREFER_SIBLING);
> +
> + sd->balance_interval = 32;
> + }
> +}
>
> Many of these differences are just bitrot / accidents, and the different
> min interval for tile was already taken care of by basing the intervals
> off of the domain weight.
>
> On that, you should also not rely on these SD_LVL things; if we wanted
> to inject an extra level they'd go all funny.
>
> > I agree that this patch doesn't separate behaviour and topology and I
> > will consider this going forward.
>
> Please take the patch I did and work from there.
>
> > > We might indeed have to have a single arch_() function that adds
> > > SD_flags, but please restrict the flags it can set -- never allow it to
> > > set behavioural flags.
> >
> > Understood. Simply exporting an sd_domain pointer is a no-go.
>
> I was more thinking along the lines of:
>
> unsigned long arch_sd_flags(unsigned long sd_flags)
> {
> return 0
> }
>
> Used like:
>
> extra_sd_flags = arch_sd_flags(sd->sd_flags);
> if (extra_sd_flags & FOO) {
> WARN("silly bugger: %x\n", extra_sd_flags);
> extra_sd_flags &= ~FOO;
> }
> sd->sd_flags |= extra_sd_flags;
>
> Or something.

We need a way to know which group of cpus the flag applies to. If we
don't want to pass a pointer to the sched_domain and we want to replace
the current named sched_domain levels with something more flexible, the
only solution I can think of right now is to pass a cpumask to the arch
code. Better suggestions?

If we let arch generate the topology it could set the flags as well. But
that means that an arch would have to deal with generating the topology
even if it just needs to flip a single flag in the default topology.

Another thing is if we want to put energy related information into the
sched_domain hierarchy. If we want various energy costs (P and C state)
to be represented here we would need to modify more than just flags.

One way to do that is to put the energy information into a sub-struct and
have another arch_sd_energy() call that allows the arch to populate that
struct with relevant information.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/