Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
From: Reinette Chatre
Date: Tue Jul 08 2025 - 19:26:49 EST
Hi Tony,
On 7/8/25 3:43 PM, Luck, Tony wrote:
> On Tue, Jul 08, 2025 at 01:49:26PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 7/8/25 12:08 PM, Luck, Tony wrote:
>>> On Thu, Jul 03, 2025 at 10:22:06AM -0700, Luck, Tony wrote:
>>>> On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
>>>>> Hi Tony and Dave,
>>>>>
>>>>> On 6/26/25 9:49 AM, Tony Luck wrote:
>>>>>> --- 14 ---
>>>>>> Add mon_evt::is_floating_point set by resctrl file system code to limit
>>>>>> which events architecture code can request be displayed in floating point.
>>>>>>
>>>>>> Simplified the fixed-point to floating point algorithm. Reinette is
>>>>>> correct that the additional "lshift" and "rshift" operations are not
>>>>>> required. All that is needed is to multiply the fixed point fractional
>>>>>> part by 10**decimal_places, add a rounding amount equivalent to a "1"
>>>>>> in the binary place after those supplied. Finally divide by 2**binary_places
>>>>>> (with a right shift).
>>>>>>
>>>>>> Explained in commit comment how I chose the number of decimal places to
>>>>>> use for each binary places value.
>>>>>>
>>>>>> N.B. Dave Martin expressed an opinion that the kernel should not do
>>>>>> this conversion. Instead it should enumerate the scaling factor for
>>>>>> each event where hardware reported a fixed point value. This patch
>>>>>> could be dropped and replaced with one to enumerate scaling factors
>>>>>> per event if others agree with Dave.
>>>>>
>>>>> Could resctrl accommodate both usages? For example, it does not
>>>>> look too invasive to add a second file <mon_evt::name>.raw for the
>>>>> mon_evt::is_floating_point events that can output something like Dave
>>>>> suggested in [1]:
>>>>>
>>>>> .raw file format could be:
>>>>> #format:<output that depends on format>
>>>>> #fixed-point:<value>/<scaling factor>
>>>>>
>>>>> Example output:
>>>>> fixed-point:0x60000/0x40000
>>>>
>>>> Dave: Is that what you want in the ".raw" file? An alternative would be
>>>> to put the format information for non-integer events into an
>>>> "info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
>>>> and just put the raw value into the ".raw" file under mon_data.
>>>
>>> Note that I thought it easier for users to keep the raw file to just
>>> showing a value, rather than including the formatting details in
>>> Reinette's proposal.
>>
>> Could you please elaborate what makes this easier? It is not obvious to me
>> how it is easier for user to open, parse, and close two files rather than one.
>> (more below)
>
> I had only considered the case where the format does not change while
> the resctrl file system is mounted. So users would read the "info" file
> to get the scaling factor once, and then read the event files with a
> parser that only has to convert a numerical string.
>
>>> Patch to implement my alternative suggestion below. To the user things
>>> look like this:
>>>
>>> $ cd /sys/fs/resctrl/mon_data/mon_PERF_PKG_01
>>> $ cat core_energy
>>> 0.02203
>>> $ cat core_energy.raw
>>> 5775
>>> $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
>>> core_energy 262144
>>> activity 262144
>>> $ bc -ql
>>> 5775 / 262144
>>> .02202987670898437500
>>>
>>> If this seems useful I can write up a commit message and include
>>> as its own patch in v7. Suggestions for better names?
>>>
>>
>> I expect users to regularly interact with the monitoring files. For example,
>> "read the core_energy of group x every second". An API like above would require
>> a contract that the scale value will never change from resctrl mount to
>> resctrl unmount. I understand that this implementation supports exactly this by
>> allowing an architecture to only enable an event once, but do you think this is
>> something that will always be the case? If not then an interface like above will
>> require user space to open, parse, close two files instead of one on a frequent basis.
>> This is not ideal if user space wants to read monitoring data of multiple
>> groups frequently.
>
> While hardware designers do some outlandish things. Changing the format
> of an event counter on the fly seems beyond the range of possibility.
> How would that even work? A driver would have to rerun enumeration of
> the feature every time it read a counter. Or hardware would have to
> supply some interrupt to tell s/w that the format changed.
There is also the new direction of resctrl dynamically enabling/disabling
hardware capabilities to consider. Here it could be reasonable, since this
would be triggered by user space, that a note of "doing this may change the
format" would be sufficient.
Something else to consider is the possibility of hardware using different scales
in different domains if the packages are not "uniform".
> I think it reasonable that resctrl be able to guarantee that the format
> described in the info file is valid for the life of the mount.
I'd really like to think that it is reasonable also.
>
>> I would also like to keep extensibility in mind. We now know that
>> unsigned decimal and fixed-point binary needs to be supported. I think any
>> new interface used to communicate formatting information to user space should be done
>> in a way that can be extended for a new format. That is, for example, why
>> I used the actual term "fixed-point" in the example. Something like this avoids
>> needing assumptions that a raw value always implies fixed-point format.
>
> This is fair. But could be covered in the "info" file with some more
> descriptive way to describe the format. Perhaps:
>
> $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
> core_energy fixed-point scale=262144
> activity fixed-point scale=262144
>
> To allow for other types in the future.
Note that the filename still has "scale" in its name making it specific to
fixed-point.
It may be expected that every entry in mon_features has an entry in
mon_features_raw_scale (name TBD). This means the existing possible "mon_features"
need to be accommodated (except the _config ones). This may also be an
opportunity to introduce the unit of measurement. For example,
$ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
core_energy fixed-point scale=262144 unit=joules
activity fixed-point scale=262144 unit=farads
...
Reinette