Re: [PATCH v6 17/21] arch_topology: Limit span of cpu_clustergroup_mask()

From: Darren Hart
Date: Fri Jul 08 2022 - 12:27:46 EST


On Fri, Jul 08, 2022 at 09:04:24AM +0100, Sudeep Holla wrote:
> Hi Darren,
>
> I will let Ionela or Dietmar cover some of the scheduler aspects as
> I don't have much knowledge in that area.
>
> On Thu, Jul 07, 2022 at 05:10:19PM -0700, Darren Hart wrote:
> > On Mon, Jul 04, 2022 at 11:16:01AM +0100, Sudeep Holla wrote:
> > > From: Ionela Voinescu <ionela.voinescu@xxxxxxx>
> >
> > Hi Sudeep and Ionela,
> >
> > >
> > > Currently the cluster identifier is not set on DT based platforms.
> > > The reset or default value is -1 for all the CPUs. Once we assign the
> > > cluster identifier values correctly, the cluster_sibling mask will be
> > > populated and returned by cpu_clustergroup_mask() to contribute in the
> > > creation of the CLS scheduling domain level, if SCHED_CLUSTER is
> > > enabled.
> > >
> > > To avoid topologies that will result in questionable or incorrect
> > > scheduling domains, impose restrictions regarding the span of clusters,
> >
> > Can you provide a specific example of a valid topology that results in
> > the wrong thing currently?
> >
>
> As a simple example, Juno with 2 clusters and L2 for each cluster. IIUC
> MC is preferred instead of CLS and both MC and CLS domains are exact
> match.
>
> > >
> > > While previously the scheduling domain builder code would have removed MC
> > > as redundant and kept CLS if SCHED_CLUSTER was enabled and the
> > > cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs,
> > > now CLS will be removed and MC kept.
> > >
> >
> > This is not desireable for all systems, particular those which don't
> > have an L3 but do share other resources - such as the snoop filter in
> > the case of the Ampere Altra.

I was wrong here. This match also modifies the coregroup, the MC after
this patch is equivalent to the CLS before the patch. The Altra is not
negatively impacted here.

> >
> > While not universally supported, we agreed in the discussion on the
> > above patch to allow systems to define clusters independently from the
> > L3 as an LLC since this is also independently defined in PPTT.
> >
> > Going back to my first comment - does this fix an existing system with a
> > valid topology?
>
> Yes as mentioned above Juno.
>
> > It's not clear to me what that would look like. The Ampere Altra presents
> > a cluster level in PPTT because that is the desireable topology for the
> > system.
>
> Absolutely wrong reason. It should present because the hardware is so,
> not because some OSPM desires something in someway. Sorry that's not how
> DT/ACPI is designed for. If 2 different OSPM desires different things, then
> one ACPI will not be sufficient.

Agree. I worded that badly. I should have said the Altra presents a PPTT
topology that accurately reflects the hardwere. There is no shared
cpu-side LLC, and there is an affinity between the DSU pairs which share
a snoop filter.

I do think the general assumption that MC shares a cpu-side LLC will
continue to present challenges to the Altra topology in terms of ongoing
to changes to the code. I don't have a good solution to that at the
moment, something I'll continue to think on.

>
> > If it's not desirable for another system to have the cluster topology -
> > shouldn't it not present that layer to the kernel in the first place?
>
> Absolutely 100% yes, it must present it if the hardware is designed so.
> No if or but.
>
> --
> Regards,
> Sudeep

Thanks Sudeep,

--
Darren Hart
Ampere Computing / OS and Kernel