Re: [PATCH 21/24] selftests/resctrl: Get resource id from cache id

From: Maciej Wieczór-Retman
Date: Tue Oct 31 2023 - 03:59:17 EST


On 2023-10-27 at 16:30:38 +0300, Ilpo Järvinen wrote:
>On Fri, 27 Oct 2023, Maciej Wieczór-Retman wrote:
>
>> On 2023-10-24 at 12:26:31 +0300, Ilpo Järvinen wrote:
>> >Resource id is acquired differently depending on CPU. AMD tests use id
>> >from L3 cache, whereas CPUs from other vendors base the id on topology
>> >package id. In order to support L2 CAT test, this has to be
>> >generalized.
>> >
>> >The driver side code seems to get the resource ids from cache ids so
>> >the approach used by the AMD branch seems to match the kernel-side
>> >code. It will also work with L2 resource IDs as long as the cache level
>> >is generalized.
>> >
>> >Using the topology id was always fragile due to mismatch with the
>> >kernel-side way to acquire the resource id. It got incorrect resource
>> >id, e.g., when Cluster-on-Die (CoD) is enabled for CPU (but CoD is not
>> >well suited for resctrl in the first place so it has not been a big
>> >issues if test don't work correctly with it).
>>
>> "not been a big issues" -> "not been a big issue"?
>>
>> "if test don't work correctly" -> "if test doesn't work correctly" / "if tests
>> don't work correctly"?
>>
>> >
>> >Taking all the above into account, generalize the resource id
>> >calculation by taking it from the cache id and do not hard-code the
>> >cache level.
>>
>> In x86/resctrl files group of CPUs sharing a resctrl resource is called a
>> domain. Because of that I think "resource id" terms should be substituted with
>> "domain id" to match core resctrl code.
>
>Okay, I was just using the terminology which pre-existed in the selftest
>code.
>
>I've really tried to look how I should call it but things were quite
>inconsistent. The selftest uses resource id and reads it from cache id.
>Documentation uses "instances of that resource" or "cache id" or "domain
>x".
>
>
>Documentation/arch/x86/resctrl.rst is very misleading btw and should be
>IMHO updated:
>
>Memory bandwidth Allocation (default mode)
>------------------------------------------
>
>Memory b/w domain is L3 cache.
>::
>
> MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
>
>It claims "domain" is L3 cache (and the value we're talking about is
>called "cache_id" here which is what you just called "domain id"). I
>suppose it should say "b/w resource is L3 cache"?
>

I believe so. After some looking around it looks like the "L3 domain" naming is
also present in rdtgroup.c:mkdir_mondata_all() and "mon_data" description in the
docs:

Docs:
"mon_data":
This contains a set of files organized by L3 domain and by
RDT event. E.g. on a system with two L3 domains there will

rdtgroup.c:
/*
* This creates a directory mon_data which contains the monitored data.
*
* mon_data has one directory for each domain which are named
* in the format mon_<domain_name>_<domain_id>. For ex: A mon_data
* with L3 domain looks as below:
* ./mon_data:
* mon_L3_00
* mon_L3_01
* mon_L3_02
* ...
*
* Each domain directory has one file per event:
* ./mon_L3_00/:
* llc_occupancy
*
*/
static int mkdir_mondata_all(struct kernfs_node *parent_kn,
struct rdtgroup *prgrp,
struct kernfs_node **dest_kn)

>From the reading I've done on resctrl my understanding was that L2/L3 are
resources that can be shared between domains.

--
Kind regards
Maciej Wieczór-Retman