Hi Barret,
On 7/21/25 11:00 AM, Barret Rhoden wrote:
x86_cache_max_rmid's default is -1. If the hardware or VM doesn't set
the right cpuid bits, num_rmid can be 0.
Signed-off-by: Barret Rhoden <brho@xxxxxxxxxx>
---
I ran into this on a VM on granite rapids. I guess the VMM told the
kernel it was a GNR, but didn't set all the cache/rsctl bits.
The -1 default of x86_cache_max_rmid is assigned if the hardware does not
support *any* L3 monitoring. Specifically:
resctrl_cpu_detect():
if (!cpu_has(c, X86_FEATURE_CQM_LLC)) {
c->x86_cache_max_rmid = -1;
...
}
The function modified by this patch, rdt_get_mon_l3_config() only runs if
the hardware supports one or more of the L3 monitoring sub-features
(X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_MBM_TOTAL, or
X86_FEATURE_CQM_MBM_LOCAL) that depend on X86_FEATURE_CQM_LLC per cpuid_deps[].
I tried to reproduce the issue on real hardware by using clearcpuid to
disable X86_FEATURE_CQM_LLC and the CPUID dependencies did the right thing
by automatically disabling X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_MBM_TOTAL,
X86_FEATURE_CQM_MBM_LOCAL, not running rdt_get_mon_l3_config() at all, and
not even attempt to enumerate any of the L3 monitoring details.
What are the symptoms when you encounter this issue?
Would it be possible to send me the CPUID flags of leaf 7, subleaf 0 as
well as all sub-leaves of leaf 0xF?
Could you please also elaborate what the impact of this issue is? Is this
a VM that has been released with many users impacted or something encountered
during development of this VM?