Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC

From: Darren Hart
Date: Thu Mar 03 2022 - 11:02:44 EST


On Thu, Mar 03, 2022 at 09:08:38AM +0100, Vincent Guittot wrote:
> On Thu, 3 Mar 2022 at 03:18, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > Control Unit, but have no shared CPU-side last level cache.
> > > >
> > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > >
> > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > >
> > > > BUG: arch topology borken
> > > > the CLS domain not a subset of the MC domain
> > > >
> > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > later removed entirely as redundant. This sched domain topology is an
> > > > improvement over previous topologies, or those built without
> > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > With the current scheduler model and heuristics, this is a desirable
> > > > default topology for Ampere Altra and Altra Max system.
> > > >
> > > > Introduce an alternate sched domain topology for arm64 without the MC
> > > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > > >
> > > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > > detect the cluster level being above the cpu llc level. This is
> > > > consistent with other architectures and provides a readily extensible
> > > > mechanism for other alternate topologies.
> > > >
> > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > >
> > > > For CPU0:
> > > >
> > > > CONFIG_SCHED_CLUSTER=y
> > > > CLS [0-1]
> > > > DIE [0-79]
> > > > NUMA [0-159]
> > > >
> > > > CONFIG_SCHED_CLUSTER is not set
> > > > DIE [0-79]
> > > > NUMA [0-159]
> > > >
> > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > > > Cc: Will Deacon <will@xxxxxxxxxx>
> > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > > Cc: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> > > > Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> > > > Cc: D. Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
> > > > Cc: Ilkka Koskinen <ilkka@xxxxxxxxxxxxxxxxxxxxxx>
> > > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.16.x
> > > > Signed-off-by: Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx>
> > > > ---
> > > > arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > > 1 file changed, 28 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > index 27df5c1e6baa..3597e75645e1 100644
> > > > --- a/arch/arm64/kernel/smp.c
> > > > +++ b/arch/arm64/kernel/smp.c
> > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > > }
> > > > }
> > > >
> > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > + { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SCHED_CLUSTER
> > > > + { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > > +#endif
> > > > +
> > > > + { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > > + { NULL, },
> > > > +};
> > > > +
> > > > +static void __init update_sched_domain_topology(void)
> > > > +{
> > > > + int cpu;
> > > > +
> > > > + for_each_possible_cpu(cpu) {
> > > > + if (cpu_topology[cpu].llc_id != -1 &&
> > >
> > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > > by ACPI system and llc_id == -1 for others like DT based system
> > >
> > > > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > > + return;
> > > > + }
> >
> > Hi Vincent,
> >
> > I did not have a non-acpi system to test, no. You're right of course,
> > llc_id is only set by ACPI systems on arm64. We could wrap this in a
> > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
> >
> > + for_each_possible_cpu(cpu) {
> > + if (cpu_topology[cpu].llc_id == -1 ||
> > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > + return;
> > + }
>
> This works.
> Also , do you really need to loop on all possible cpus ? Would it be
> enough to check only the 1st cpu ?
> You won't be able to support a mixed topology so all cpus have the
> same kind of topology i.e either cluster before or cluster before the
> MC level

My intention here is to restrict the use of of the new topology to a very
specific architecture where the problem is known to manifest, and avoid
introducing any unexpected change to other systems.

For other systems, they will break on the first loop, so the loop is also
minimal impact.

As for supporting a mixed topology, my intention was again to not make any
statement about the existance or viability of such systems. If they would break
before, they would still break. If a new topology is needed for them, this
provides a easily modifiable location to do that.

If the consensus is we don't need the loop, this simplifies my specific use case
at the cost of applying to a broader set (but only hypothetically I believe) of
topologies. So no objection to dropping the loop.

Will, do you have a preference? Lean toward targeted change and minimal impact,
or lean toward simpler implementation with slightly broader impact?

Thanks,

>
>
> >
> > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> > arm64 systems who can test and verify this behaves as intended. I will ask
> > around tomorrow as well to see what I may have access to.
> >
> > Thanks,
> >
> > > > +
> > > > + pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > > + set_sched_topology(arm64_no_mc_topology);
> > > > +}
> > > > +
> > > > void __init smp_cpus_done(unsigned int max_cpus)
> > > > {
> > > > pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > > hyp_mode_check();
> > > > apply_alternatives_all();
> > > > mark_linear_text_alias_ro();
> > > > + update_sched_domain_topology();
> > > > }
> > > >
> > > > void __init smp_prepare_boot_cpu(void)
> > > > --
> > > > 2.31.1
> > > >
> >
> > --
> > Darren Hart
> > Ampere Computing / OS and Kernel

--
Darren Hart
Ampere Computing / OS and Kernel