Re: [RFC PATCH] arch_topology: Use llc_id instead of package_id

From: Sudeep Holla
Date: Thu May 19 2022 - 05:22:10 EST


On Wed, May 18, 2022 at 05:54:23PM +0200, Dietmar Eggemann wrote:
> On 18/05/2022 12:43, Sudeep Holla wrote:
> > On Wed, May 18, 2022 at 12:23:35PM +0200, Dietmar Eggemann wrote:
> >> On 17/05/2022 12:57, Sudeep Holla wrote:
> > [...]
>
> [...]
>
> >> I see. Your requirement is valid information under
> >> /sys/devices/system/cpu/cpuX/{topology, cache} for userspace.
> >>
> >> I'm not sure that we can get core_siblings_list and package_cpus_list
> >> correctly setup.
> >>
> >> With your patch we have now:
> >>
> >> root@juno:/sys/devices/system/cpu/cpu0/topology# cat core_siblings_list
> >> 0-5
> >> root@juno:/sys/devices/system/cpu/cpu0/topology# cat package_cpus_list
> >> 0-5
> >>
> >> Before we had [0,3-5] for both.
> >>
> >
> > Correct and that was pure wrong for a single socket system. It must
> > cover all the CPUs. Tools like lscpu reports them as 2 socket system
> > which is wrong. There were reports for that but no one really chased it
> > up to get it fixed. So assumed it is not so important but still it is
> > wrong. ACPI platforms cared and wanted it fixed with ACPI PPTT since
> > the PPTT support. So check the difference without my patches on Juno
> > in DT and ACPI boot. We should get same output for both.
> >
> >>
> >> I'm afraid we can have 2 different mask here because of:
>
> Sorry, s/can/can't
>

No worries I assumed and read it so.

> >> 72 define_siblings_read_func(core_siblings, core_cpumask);
> >> ^^^^^^^^^^^^
> >> 88 define_siblings_read_func(package_cpus, core_cpumask);
> >> ^^^^^^^^^^^^
> >>
> >
> > Yes even I got confused and the Documentation revealed one is deprecated
> > or obsolete(Documentation/ABI/stable/sysfs-devices-system-cpu)
> >
> > 74 What: /sys/devices/system/cpu/cpuX/topology/package_cpus
> > 75 Description: internal kernel map of the CPUs sharing the same physical_package_id.
> > 76 (deprecated name: "core_siblings").
> > 77 Values: hexadecimal bitmask.
> > 78
> > 79 What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list
> > 80 Description: human-readable list of CPUs sharing the same physical_package_id.
> > 81 The format is like 0-3, 8-11, 14,17.
> > 82 (deprecated name: "core_siblings_list")
> > 83 Values: decimal list.
>
> Ah, that makes it more clear to me! Thanks. So DIE boundary, i.e.
> cpu_cpu_mask() {return cpumask_of_node(cpu_to_node(cpu))} is not related
> to package cpumask at all. This is why we have the first if condition in
> cpu_coregroup_mask():
>
> 658 const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
> ...
> 661 if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
> 662 /* not numa in package, lets use the package siblings */
> 663 core_mask = &cpu_topology[cpu].core_sibling;
>

Correct, either I got confused with the names it was different from what
I had seen last time I looked at these more than couple of years ago.

> [...]
>
> >> OK, we don't support Phantom domains via 1. level Cluster in cpu-map
> >> anymore. We already explicitly informed the Android community. But I'm
> >> sure people will only discover this if something breaks on their
> >> platforms and they are able to detect this.
> >>
> >
> > Again if it was working just by cache with their own phantom domains that
> > are not upstream, then nothing is broken. At-least I see that we have now
> > fixed and aligned DT with ACPI. While I understand your concern, I see
> > this as main issue and not sched domains which may or may not be using
> > topology info incorrectly.
>
> OK, lets see how you incorporated llc into the topology code in your v2
> first.
>

Sure, thanks for agreeing to review those implicitly 😄.

> >>>> If we say we only support L2 sharing (asymmetric here since only CPU0-3
> >>>> have it !!!) and we don't support Phantom then your approach will work
> >>>> for such systems.
> >>>
> >>> Thanks, as I said what is *Phantom* domain ;) ? Can you point me to the
> >>> DT or ACPI binding for the same ? Just kidding, I know they don't exist.
> >>
> >> They do ;-) 1. level Clusters ... but they are used for uArch
> >> boundaries, not for LLC boundaries. That's the existing issue in DT,
> >> topology information has 2 sources: (1) cpu-map and (2) cache info.
> >>
> >
> > Sure, they can fix or add more optimisation on top of what I have sent[1]
>
> If you add llc parsing in your v2, they should have everything they need
> for Armv9 with uArch boundaries and L2 complexes. What I'm interested in
> is seeing how we solve the 2 sources (cache and cpu-map) issue here.

Not sure if just llc with resolve what we need on those Armv9 systems unless
L2 = LLC. If L2 != LLC, then my patches won't help that much.

> Example:
>
> cpu-map:
>
> socket
> cluster <-- (1)
> core
> thread
>
> cache:
>
> llc
>
> How do people know that (1) is L2 and not LLC?
>

We can get the cpumask of cpus sharing each unique cache in the system.
That is not a problem, we have cacheinfo driver for that. But that is
initialised quite late for a reason and if we need that info much early
then that needs some reworking.

My patches is addressing only LLC and hence much simpler.

> But let us switch the discussion to you v2 on this one. I have to see
> your implementation first.
>

Sure, I just replied here just to address if there were any open questions.
Let us discuss any issues in the below mail thread.

> > [1] https://lore.kernel.org/lkml/20220518093325.2070336-1-sudeep.holla@xxxxxxx
>
> [...]

--
Regards,
Sudeep