Re: [PATCH] x86/resctrl: Add tracepoint for llc_occupancy tracking

From: Reinette Chatre
Date: Mon Feb 05 2024 - 16:20:42 EST


Hi Haifeng,

Thank you for proposing this feature. I think it is a useful addition.
I tried it out after removing a monitor group and it was insightful
to be able to see how the cache occupancy of a removed group
shrinks over time.

On 1/26/2024 5:02 AM, Haifeng Xu wrote:
> If llc_occupany is enabled, the rmid may not be freed immediately unless

(llc_occupany -> llc_occupancy ... one more instance below)

Please use caps (RMID) for acronym.

> its llc_occupany is less than the resctrl_rmid_realloc_threshold.

I think it will help to highlight that this is related to monitor group
removal.

>
> In our production environment, those unused rmids get stuck in the limbo
> list forever because their llc_occupancy are larger than the threshold.
> After turning it up, we can successfully free unused rmids and create
> new monitor groups. In order to acquire the llc_occupancy of rmids in
> each rdt domain, we use perf tool to track and filter the log manually.

Could you please elaborate how you "use perf tool to track and
filter the log manually"?

>
> It's not efficient enough. Therefore, we can add a new tracepoint that

"It's not efficient enough." is a vague statement. How was efficiency measured?
and what is "enough"?

Please do not impersonate code ("we can add a new ...") and stick to the
imperative tone. Please see the "Changelog" section in
Documentation/process/maintainer-tip.rst for more details.


> shows the llc_occupancy of busy rmids when scanning the limbo list. It
> can help us to adjust the resctrl_rmid_realloc_threshold to a reasonable
> value.
>
> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
> arch/x86/kernel/cpu/resctrl/monitor.c | 5 ++++
> arch/x86/kernel/cpu/resctrl/monitor_event.h | 30 +++++++++++++++++++++
> 3 files changed, 36 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/resctrl/monitor_event.h
>
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index 4a06c37b9cf1..0d3031850d26 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -1,4 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0
> obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
> obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o pseudo_lock.o
> +CFLAGS_monitor.o = -I$(src)
> CFLAGS_pseudo_lock.o = -I$(src)

>From what I understand only one of the c files should define CREATE_TRACE_POINTS
and it is that c file that should have its CFLAGS modified.

I do not think it is necessary to preemptively fragment the resctrl tracing by creating
a separate header file for the monitor tracepoints. This is something that may be
needed as part of current re-design but I think it best to have a simple start that
places all tracepoints in the same header file.

I would thus like to propose that this work starts by renaming pseudo_lock_event.h
to a more generic (trace.h?) that will contain all the tracepoints. pseudo-lock.c
is the file that define's CREATE_TRACE_POINTS so it should remain as the only
one with its CFLAGS modified. monitor.c should be able to just include "trace.h"
(after "trace.h" is updated with the new tracepoint) without needing to define
CREATE_TRACE_POINTS.


> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..a6f94fcae174 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -24,6 +24,10 @@
>
> #include "internal.h"
>
> +#define CREATE_TRACE_POINTS
> +#include "monitor_event.h"
> +#undef CREATE_TRACE_POINTS
> +
> struct rmid_entry {
> u32 rmid;
> int busy;
> @@ -302,6 +306,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
> }
> }
> crmid = nrmid + 1;
> + trace_rmid_llc_occupancy(nrmid, d->id, val);
> }
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor_event.h b/arch/x86/kernel/cpu/resctrl/monitor_event.h
> new file mode 100644
> index 000000000000..91265a2dd2c9
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/monitor_event.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM resctrl
> +
> +#if !defined(_TRACE_MONITOR_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_MONITOR_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(rmid_llc_occupancy,
> + TP_PROTO(u32 rmid, int id, u64 occupancy),
> + TP_ARGS(rmid, id, occupancy),
> + TP_STRUCT__entry(__field(u32, rmid)
> + __field(int, id)
> + __field(u64, occupancy)),
> + TP_fast_assign(__entry->rmid = rmid;
> + __entry->id = id;
> + __entry->occupancy = occupancy;),
> + TP_printk("rmid=%u domain=%d occupancy=%llu",
> + __entry->rmid, __entry->id, __entry->occupancy)
> + );
> +

Naming is always complicated but I would like to make two proposals with
motivation:
a) Please prefix this event with "mon_" as the first member of a new
"monitor" category of resctrl tracepoints. Users can then interact
with monitor tracepoints with convenience of "mon*". Later we could
perhaps add a new "alloc*" category.
b) Please replace all "rmid" exposure to user space with a more generic
"mon_hw_id", or if "mon" is clear, just "hw_id". For reference you
can search for "mon_hw_id" in Documentation/arch/x86/resctrl.rst.
This is needed because resctrl is in the process of being able to
be an interface for Arm's MPAM and "rmid" is an x86 specific term.

Considering this, what do you think of something like:
tracepoint name: mon_llc_occupancy_limbo
print: "mon_hw_id=%u domain=%d llc_occupancy=%llu"

> +#endif /* _TRACE_MONITOR_H */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH .
> +#define TRACE_INCLUDE_FILE monitor_event
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

Thank you.

Reinette