[PATCH 0/7] x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Thu Jan 26 2023 - 13:42:42 EST


Intel server systems starting with Skylake support a mode that logically
partitions each socket. E.g. when partitioned two ways, half the cores,
L3 cache, and memory controllers are allocated to each of the partitions.
This may reduce average latency to access L3 cache and memory, with the
tradeoff that only half the L3 cache is available for subnode-local memory
access.

The existing Linux resctrl system mishandles RDT monitoring on systems
with SNC mode enabled.

But, with some simple changes, this can be fixed. When SNC mode is
enabled, the RDT RMID counters are also partitioned with the low numbered
counters going to the first partition, and the high numbered counters
to the second partition[1]. The key is to adjust the RMID value written
to the IA32_PQR_ASSOC MSR on context switch, and the value written to
the IA32_QM_EVTSEL when reading out counters, and to change the scaling
factor that was read from CPUID(0xf,1).EBX

E.g. in 2-way Sub-NUMA cluster with 200 RMID counters there are only
100 available counters to the resctrl code. When running on the first
SNC node RMID values 0..99 are used as before. But when running on the
second node, a task that is assigned resctrl rmid=10 must load 10+100
into IA32_PQR_ASSOC to use RMID counter 110.

There should be no changes to functionality on other architectures,
or on Intel systems with SNC disabled, where snc_ways == 1.

-Tony

[1] Some systems also support a 4-way split. All the above still
applies, just need to account for cores, cache, memory controllers
and RMID counters being divided four ways instead of two.

Tony Luck (7):
x86/resctrl: Refactor in preparation for node-scoped resources
x86/resctrl: Remove hard code of RDT_RESOURCE_L3 in monitor.c
x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]
x86/resctrl: Add code to setup monitoring at L3 or NODE scope.
x86/resctrl: Add a new "snc_ways" file to the monitoring info
directory.
x86/resctrl: Update documentation with Sub-NUMA cluster changes
x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize.

Documentation/x86/resctrl.rst | 15 +++-
include/linux/resctrl.h | 4 +-
arch/x86/include/asm/resctrl.h | 4 +-
arch/x86/kernel/cpu/resctrl/internal.h | 9 +++
arch/x86/kernel/cpu/resctrl/core.c | 83 ++++++++++++++++++++---
arch/x86/kernel/cpu/resctrl/monitor.c | 24 ++++---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 22 +++++-
8 files changed, 136 insertions(+), 27 deletions(-)

--
2.39.1