Re: [RFC PATCH] arch_topology: Use llc_id instead of package_id

From: Dietmar Eggemann
Date: Wed May 18 2022 - 06:23:58 EST


On 17/05/2022 12:57, Sudeep Holla wrote:
> Hi Dietmar,
>
> Thanks for the detailed response.
>
> On Tue, May 17, 2022 at 11:14:44AM +0200, Dietmar Eggemann wrote:
>> On 16/05/2022 12:35, Sudeep Holla wrote:
>>> On Fri, May 13, 2022 at 02:04:29PM +0200, Dietmar Eggemann wrote:
>>>> On 13/05/2022 13:03, Sudeep Holla wrote:
>>>>> On Fri, May 13, 2022 at 12:42:00PM +0200, Dietmar Eggemann wrote:
>>>>>> On 13/05/2022 11:03, Sudeep Holla wrote:
>>>>>>> On Fri, May 13, 2022 at 10:34:00AM +0200, Dietmar Eggemann wrote:

[...]

>>> I see on Juno with SCHED_CLUSTER and cluster masks set, I see CLS and MC
>>> domains.
>>
>> Right but that's not really correct from the scheduler's POV.
>>
>
> OK
>
>> Juno has LLC on cpumasks [0,3-5] and [1-2], not on [0-5]. So the most
>> important information is the highest Sched Domain with the
>> SD_SHARE_PKG_RESOURCES flag. This is the MC layer (cpu_core_flags() in
>> default_topology[]). So the scheduler would think that [0-5] are sharing
>> LLC.
>>
>
> Ah OK, but if LLC sibling masks are updated, cpu_coregroup_mask() takes
> care of it IIUC, right ?

Yes. That's the:

691 if (cpu_topology[cpu].llc_id != -1) {
692 if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
693 core_mask = &cpu_topology[cpu].llc_sibling;
694 }

condition in cpu_coregroup_mask().

>
>> You have LLC at:
>>
>> cat /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list
>> ^^^^^^
>> 0,3-5
>>
>> but the scheduler sees the highest SD_SHARE_PKG_RESOURCES on MC:
>>
>> cat /sys/kernel/debug/sched/domains/cpu0/domain1/flags
>> ^^^^^^^
>> ... SD_SHARE_PKG_RESOURCES ...
>>
>> [...]
>>
>>>> For one level (MC) yes, but not for 2 (MC and CLS). And cluster_id was
>>>> introduces for the 2. level.
>>>>
>>>
>>> That sounds wrong and not what ACPI PPTT code says. My series just aligns
>>> with what is done with ACPI PPTT IIUC. I need to check that again if my
>>> understand differs from what is being done. But the example of Kunpeng920
>>> aligns with my understanding.
>>
>> (1) IMHO, as long as we don't consider cache (llc) information in DT we
>> can't have the same coverage as ACPI.
>>
>
> Agreed. But we are not changing any topology masks as per sched domain
> requirements as they get exposed to the userspace as is.

I see. Your requirement is valid information under
/sys/devices/system/cpu/cpuX/{topology, cache} for userspace.

I'm not sure that we can get core_siblings_list and package_cpus_list
correctly setup.

With your patch we have now:

root@juno:/sys/devices/system/cpu/cpu0/topology# cat core_siblings_list
0-5
root@juno:/sys/devices/system/cpu/cpu0/topology# cat package_cpus_list
0-5

Before we had [0,3-5] for both.


I'm afraid we can have 2 different mask here because of:

72 define_siblings_read_func(core_siblings, core_cpumask);
^^^^^^^^^^^^
88 define_siblings_read_func(package_cpus, core_cpumask);
^^^^^^^^^^^^

[...]

> Understood and on Juno if we get llc_siblings right, the sched domains
> must be sorted correctly ?

Yes, then it should do exactly what ACPI is leading us to on this !NUMA
Kunpeng920 example.

>> Coming back to the original request (the support of Armv9 L2 complexes
>> in Android) from Qing on systems like QC SM8450:
>>
>> .---------------.
>> CPU |0 1 2 3 4 5 6 7|
>> +---------------+
>> uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
>> +---------------+
>> L2 | | | | | | |
>> +---------------+
>> L3 |<-- -->|
>> +---------------+
>> |<-- cluster -->|
>> +---------------+
>> |<-- DSU -->|
>> '---------------'
>>
>> This still wouldn't be possible. We know that Phantom Domain (grouping
>> after uarch) is not supported in mainline but heavily used in Android
>> (legacy deps).
>>
>
> Correct, unless you convince to get a suitable notion of *whatever*
> phantom domains represent into the DT bindings, they don't exist.
> If someone wants to support this config, they must first represent that
> in the firmware so that OS can infer information from the same.

OK, we don't support Phantom domains via 1. level Cluster in cpu-map
anymore. We already explicitly informed the Android community. But I'm
sure people will only discover this if something breaks on their
platforms and they are able to detect this.

>> If we say we only support L2 sharing (asymmetric here since only CPU0-3
>> have it !!!) and we don't support Phantom then your approach will work
>> for such systems.
>
> Thanks, as I said what is *Phantom* domain ;) ? Can you point me to the
> DT or ACPI binding for the same ? Just kidding, I know they don't exist.

They do ;-) 1. level Clusters ... but they are used for uArch
boundaries, not for LLC boundaries. That's the existing issue in DT,
topology information has 2 sources: (1) cpu-map and (2) cache info.

> Anyways, I understand your concern that llc_sibling must go with my set
> of topology changes which I agree. Is that the only concern ?

Cool. Let me review your v2 first ;-)