[PATCH 3/3 v2] perf/amd/uncore: Add support for Family 19h L3 PMU

From: Kim Phillips
Date: Fri Mar 13 2020 - 19:11:08 EST


Family 19h introduces change in slice, core and thread specification
in its L3 Performance Event Select (ChL3PmcCfg) h/w register. The
change is incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't
get programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes
an encoded core number, and a separate thread mask. There are
new bits that are set for all cores and all slices, of which
only the latter is used, since the driver counts events
for all slices on behalf of the specified cpu.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric
mode decision based on Family 17h or above, not just 17h and 18h:
the Family 19h Data Fabric PMC is compatible with the Family 17h
DF PMC.

Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Michael Petlan <mpetlan@xxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: x86@xxxxxxxxxx
---
v2: rewrote commit text to not use "We" etc.,
based on Boris' comments:

https://lkml.org/lkml/2020/3/12/583

arch/x86/events/amd/uncore.c | 20 ++++++++++++++------
arch/x86/include/asm/perf_event.h | 15 +++++++++++++--
2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index b622e59ccdd0..f3d5e4e2f285 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -191,10 +191,18 @@ static u64 l3_thread_slice_mask(int cpu)
if (topology_smt_supported() && !topology_is_primary_thread(cpu))
thread = 1;

- shift = AMD64_L3_THREAD_SHIFT + 2 * (core % 4) + thread;
+ if (boot_cpu_data.x86 <= 0x18) {
+ shift = AMD64_L3_THREAD_SHIFT + 2 * (core % 4) + thread;
+ thread_mask = BIT_ULL(shift);
+
+ return AMD64_L3_SLICE_MASK | thread_mask;
+ }
+
+ core = (core << AMD64_L3_COREID_SHIFT) & AMD64_L3_COREID_MASK;
+ shift = AMD64_L3_THREAD_SHIFT + thread;
thread_mask = BIT_ULL(shift);

- return AMD64_L3_SLICE_MASK | thread_mask;
+ return AMD64_L3_EN_ALL_SLICES | core | thread_mask;
}

static int amd_uncore_event_init(struct perf_event *event)
@@ -223,8 +231,8 @@ static int amd_uncore_event_init(struct perf_event *event)
return -EINVAL;

/*
- * SliceMask and ThreadMask need to be set for certain L3 events in
- * Family 17h. For other events, the two fields do not affect the count.
+ * SliceMask and ThreadMask need to be set for certain L3 events.
+ * For other events, the two fields do not affect the count.
*/
if (l3_mask && is_llc_event(event))
hwc->config |= l3_thread_slice_mask(event->cpu);
@@ -533,9 +541,9 @@ static int __init amd_uncore_init(void)
if (!boot_cpu_has(X86_FEATURE_TOPOEXT))
return -ENODEV;

- if (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) {
+ if (boot_cpu_data.x86 >= 0x17) {
/*
- * For F17h or F18h, the Northbridge counters are
+ * For F17h and above, the Northbridge counters are
* repurposed as Data Fabric counters. Also, L3
* counters are supported too. The PMUs are exported
* based on family as either L2 or L3 and NB or DF.
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 29964b0e1075..e855e9cf2c37 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -50,11 +50,22 @@

#define AMD64_L3_SLICE_SHIFT 48
#define AMD64_L3_SLICE_MASK \
- ((0xFULL) << AMD64_L3_SLICE_SHIFT)
+ (0xFULL << AMD64_L3_SLICE_SHIFT)
+#define AMD64_L3_SLICEID_MASK \
+ (0x7ULL << AMD64_L3_SLICE_SHIFT)

#define AMD64_L3_THREAD_SHIFT 56
#define AMD64_L3_THREAD_MASK \
- ((0xFFULL) << AMD64_L3_THREAD_SHIFT)
+ (0xFFULL << AMD64_L3_THREAD_SHIFT)
+#define AMD64_L3_F19H_THREAD_MASK \
+ (0x3ULL << AMD64_L3_THREAD_SHIFT)
+
+#define AMD64_L3_EN_ALL_CORES BIT_ULL(47)
+#define AMD64_L3_EN_ALL_SLICES BIT_ULL(46)
+
+#define AMD64_L3_COREID_SHIFT 42
+#define AMD64_L3_COREID_MASK \
+ (0x7ULL << AMD64_L3_COREID_SHIFT)

#define X86_RAW_EVENT_MASK \
(ARCH_PERFMON_EVENTSEL_EVENT | \
--
2.25.1