Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

From: Haifeng Xu
Date: Tue Mar 19 2024 - 02:41:39 EST




On 2024/3/14 06:47, Reinette Chatre wrote:
> Hi Haifeng,
>
> On 3/7/2024 11:41 PM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, perf can be used to
>> acquire the llc_occupancy of RMIDs in each rdt domain.
>>
>> Instead of using perf tool to track llc_occupancy and filter the log
>> manually, it is more convenient for users to use tracepoint to do this
>> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
>> when scanning the limbo list.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx>
>> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>> Suggested-by: James Morse <james.morse@xxxxxxx>
>> Reviewed-by: James Morse <james.morse@xxxxxxx>
>> ---
>> Documentation/arch/x86/resctrl.rst | 8 ++++++++
>> arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++++
>> arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++
>> 3 files changed, 33 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index a6279df64a9d..dd3507dc765c 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
>> each bit represents 5% of the capacity of the cache. You could partition
>> the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>>
>> +Tracepoint - mon_llc_occupancy_limbo
>> +------------------------------------
>
> I think that the below paragraph would fit nicely as a new paragraph in the
> existing "max_threshold_occupancy - generic concepts" section. To support that
> just one change to text below ...
>
>> +This tracepoint gives you the precise occupancy values for a subset of RMID
>
> The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
> for a subset of RMID ...
>

OK, I'll move the descriptions to "max_threshold_occupancy - generic concepts" section.

>> +that are not immediately available for allocation. This can't be relied on
>> +to produce output every second, it may be necessary to attempt to create an
>> +empty monitor group to force an update. Output may only be produced if creation
>> +of a control or monitor group fails.
>> +
>> Memory bandwidth Allocation and monitoring
>> ==========================================
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index c34a35ec0f03..60b6a29a9e29 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -24,6 +24,7 @@
>> #include <asm/resctrl.h>
>>
>> #include "internal.h"
>> +#include "trace.h"
>>
>> /**
>> * struct rmid_entry - dirty tracking for all RMID.
>> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>> rmid_dirty = true;
>> } else {
>> rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
>> +
>> + /* x86's CLOSID and RMID are independent numbers, so the entry's
>> + * closid is a invalid CLOSID. But on arm64, the RMID value isn't
>> + * a unique number for each CLOSID. It's necessary to track both
>> + * CLOSID and RMID because there may be dependencies between each
>> + * other on some architectures.
>> + */
>
> Please watch for proper formatting of multi-line comment and consistent capitalization.
> I also think comment can be more accurate, for example:
>
> /*
> * x86's CLOSID and RMID are independent numbers, so the entry's
> * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
> * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
> * to select the configuration. It is thus necessary to track both
> * CLOSID and RMID because there may be dependencies between them
> * on some architectures.
> */
>

Thanks for your suggestions!

>> + trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
>> }
>>
>> if (force_free || !rmid_dirty) {
>> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
>> index ed5c66b8ab0b..b310b4985b94 100644
>> --- a/arch/x86/kernel/cpu/resctrl/trace.h
>> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
>> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
>> TP_printk("hits=%llu miss=%llu",
>> __entry->l3_hits, __entry->l3_miss));
>>
>> +TRACE_EVENT(mon_llc_occupancy_limbo,
>> + TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
>> + TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
>> + TP_STRUCT__entry(__field(u32, ctrl_hw_id)
>> + __field(u32, mon_hw_id)
>> + __field(int, domain_id)
>> + __field(u64, llc_occupancy_bytes)),
>> + TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
>> + __entry->mon_hw_id = mon_hw_id;
>> + __entry->domain_id = domain_id;
>> + __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
>> + TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",
>
> domain_d -> domain_id
>
>> + __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
>> + __entry->llc_occupancy_bytes)
>> + );
>> +
>> #endif /* _TRACE_RESCTRL_H */
>>
>> #undef TRACE_INCLUDE_PATH
>
>
> Reinette

Thanks!