Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled in mbm_event mode

From: Babu Moger

Date: Tue Oct 14 2025 - 13:43:09 EST


Hi Reinette,

On 10/14/25 12:38, Babu Moger wrote:
Hi Reinette,

On 10/14/25 11:24, Reinette Chatre wrote:
Hi Babu,

On 10/7/25 7:38 PM, Reinette Chatre wrote:
On 10/7/25 10:36 AM, Babu Moger wrote:
On 10/6/25 20:23, Reinette Chatre wrote:
On 10/6/25 1:38 PM, Moger, Babu wrote:
On 10/6/25 12:56, Reinette Chatre wrote:
On 9/30/25 1:26 PM, Babu Moger wrote:
resctrl features can be enabled or disabled using boot-time kernel
parameters. To turn off the memory bandwidth events (mbmtotal and
mbmlocal), users need to pass the following parameter to the kernel:
"rdt=!mbmtotal,!mbmlocal".
ah, indeed ... although, the intention behind the mbmtotal and mbmlocal kernel
parameters was to connect them to the actual hardware features identified
by X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL respectively.


Found that memory bandwidth events (mbmtotal and mbmlocal) cannot be
disabled when mbm_event mode is enabled. resctrl_mon_resource_init()
unconditionally enables these events without checking if the underlying
hardware supports them.
Technically this is correct since if hardware supports ABMC then the
hardware is no longer required to support X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL in order to provide mbm_total_bytes
and mbm_local_bytes.

I can see how this may be confusing to user space though ...

Remove the unconditional enablement of MBM features in
resctrl_mon_resource_init() to fix the problem. The hardware support
verification is already done in get_rdt_mon_resources().
I believe by "hardware support" you mean hardware support for
X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL. Wouldn't a fix like
this then require any system that supports ABMC to also support
X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL to be able to
support mbm_total_bytes and mbm_local_bytes?
Yes. That is correct. Right now, ABMC and X86_FEATURE_CQM_MBM_TOTAL/
X86_FEATURE_CQM_MBM_LOCAL are kind of tightly coupled. We have not clearly
separated the that.
Are you speaking from resctrl side since from what I understand these are
independent features from the hardware side?
It is independent from hardware side. I meant we still use legacy events from "default" mode.
Thank you for confirming. I was wondering if we need to fix it via cpuid_deps[]
and resctrl_cpu_detect() to address a hardware dependency. If hardware self
does not have the dependency then we need to fix it another way.

This problem seems to be similar to the one solved by [1] since
by supporting ABMC there is no "hardware does not support mbmtotal/mbmlocal"
but instead there only needs to be a check if the feature has been disabled
by command line. That is, add a rdt_is_feature_enabled() check to the
existing "!resctrl_is_mon_event_enabled()" check?
Enable or disable needs to be done at get_rdt_mon_resources(). It needs to
be done early in  the initialization before calling domain_add_cpu() where
event data structures (mbm_states aarch_mbm_states) are allocated.
Good point. My mistake to suggest the event should be enabled by
resctrl fs.

How about adding another check in get_rdt_mon_resources()?

if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)
     || rdt_is_feature_enabled(mbmtotal)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
                 ret = true;
         }
Something like this yes. I think it should be in rdt_get_mon_l3_config() though, within
the ABMC feature settings. If not then there may be an issue if the user boots with
rdt=!abmc? I cannot see why the rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) check is needed,
which flow are you addressing?

Before we exchange code I would like to step back a bit just to be clear that we agree
on the current issues and what user space may expect. After this it should be easier to
exchange code. (more below)

I need to take Tony's patch for this.

But wait ... I think there may be a bigger problem when considering systems
that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
Shouldn't resctrl prevent such a system from switching to "default"
mbm_assign_mode? Otherwise resctrl will happily let such a system switch
to default mode and when user attempts to read an event file resctrl will
attempt to read it via MSRs that are not supported.
Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
to handle this case in show() while preventing user space from switching to
"default" mode on write()?
This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
events are not created.
By "right now" I assume you mean the current implementation? I think your statement
assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
Current implementation will enable MBM events if ABMC is supported. When the
first CPU of a domain comes online after that then resctrl will create the mon_data
files. These files will remain if a user then switches to default mode and if
the user then attempts to read one of these counters then I expect problems.
Yes. It will be a problem in the that case.
Thinking about this more the issue is not about the mon_data files being created since
they are only created if resctrl is mounted and resctrl_mon_resource_init() is run
before creating the mountpoint. From what I can tell current MBM events supported by
ABMC will be enabled at the time resctrl can be mounted so if X86_FEATURE_CQM_MBM_TOTAL
and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I believe the
mon_data files will be created.

There is a problem with the actual domain creation during resctrl initialization
where the MBM state data structures are created and depend on the events being
enabled then.
resctrl assumes that if an event is enabled then that event's associated
rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states exist and if
those data structures are created (or not created) during CPU online and MBM
event comes online later then there will be invalid memory accesses.

The conclusion is the same though ... the events need to be initialized during
resctrl initialization as you note above.

I am not clear on using config option you mentioned above.
This is more about what is accomplished by the config option than whether it is
a config option that controls the flow. More below but I believe there may be
scenarios where only mbm_event is supported and in that case I expect, even on AMD,
it may be possible that there is no supported "default" mode and thus:
  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
   [mbm_event]

What about using the check resctrl_is_mon_event_enabled() in

resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?

Trying to think through how to support a system that can switch between default
and mbm_event mode I see a couple of things to consider. This is as I am thinking
through the flows without able to experiment. I think it may help if you could sanity
check this with perhaps a few experiments to considering the flows yourself to see where
I am missing things.

When we are clear on the flows to support and how to interact with user space it will
be easier to start exchanging code.

a) MBM state data structures
    As mentioned above, rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
    are created during CPU online based on MBM event enabled state. During runtime
    an enabled MBM event is assumed to have state.
    To me this implies that any possible MBM event should be enabled during early
    initialization.
    A consequence is that any possible MBM event will have its associated event file
    created even if the active mode of the time cannot support it. (I do not think
    we want to have event files come and go).
b) Switching between modes.
    From what I can tell switching mode is always allowed as long as system supports
    assignable counters and that may not be correct. Consider a system that supports
    ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or X86_FEATURE_CQM_MBM_LOCAL ...
    should it be allowed to switch to "default" mode? At this time I believe this is allowed
    yet this is an unusable state (as far as MBM goes) and I expect any attempt at reading
    an event file will result in invalid MSR access?
    Complexity increases if there is a mismatch in supported events, for example if mbm_event
    mode supports total and local but default mode only supports one. Should it be allowed
    to switch modes? If so, user can then still read from both files, the check whether assignable
    counters is enabled will fail and resctrl will attempt to read both via the counter MSRs,
    even an unsupported event (continued below).
c) Read of event file
    A user can read from event file any time even if active mode (default or mbm_event) does
    not support it. If mbm_event mode is enabled then resctrl will attempt to use counters,
    if default mode is enabled then resctrl will attempt to use MSRs.
    This currently entirely depends on whether mbm_event mode is enabled or not.
    Perhaps we should add checks here to prevent user from reading an event if the
    active mode does not support it? Alternatively prevent user from switching to a mode
    that cannot be supported.

Look forward to how you view things and thoughts on how user may expect to interact with these
features.


Yea.  Taken note of all your points. Sorry for the Iate response. I was investigating on how to fix in a proper way.


I am concerned about this issue. The original changelog only mentions that events are enabled when
they should not be but it looks to me that there is a more serious issue if the user then attempts
to read from such an event. Have you tried the scenario when a user boots with the parameters
mentioned in changelog (rdt=!mbmtotal,!mbmlocal) and then attempts to read one of these events?
Reading from the event will attempt to access its architectural state but from what I can tell
that will not be allocated since the events are not enabled at the time of the allocation.


Yes. I saw the issues. It fails to mount in my case with panic trace.



This needs to be fixed during this cycle. A week has passed since my previous message so I do not


Yes. I understand your concern.


think that it will be possible to create a full featured solution that keeps X86_FEATURE_ABMC
and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL independent.


Agree.



What do you think of something like below that builds on your original change and additionally
enforces dependency between these features to support the resctrl assumptions? From what I understand
this is ok for current AMD hardware? A not-as-urgent follow-up can make these features independent
again?


Yes. I tested it. Works fine.  It defaults to "default" mode if both the events(local and total) are disabled in kernel parameter. That is expected.




diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c8945610d455..fd42fe7b2fdc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -452,7 +452,16 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
          r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
      }
  -    if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+    /*
+     * resctrl assumes a system that supports assignable counters can
+     * switch to "default" mode. Ensure that there is a "default" mode
+     * to switch to. This enforces a dependency between the independent
+     * X86_FEATURE_ABMC and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL
+     * hardware features.
+     */
+    if (rdt_cpu_has(X86_FEATURE_ABMC) &&
+        (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
+         rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
          r->mon.mbm_cntr_assignable = true;
          cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
          r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 4076336fbba6..572a9925bd6c 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1782,15 +1782,13 @@ int resctrl_mon_resource_init(void)
          mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
        if (r->mon.mbm_cntr_assignable) {
-        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
- resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
-        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
- resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
-        mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
-        mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
-                                   (READS_TO_LOCAL_MEM |
-                                    READS_TO_LOCAL_S_MEM |
- NON_TEMP_WRITE_TO_LOCAL_MEM);
+        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+            mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
+        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
+            mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
+                                       (READS_TO_LOCAL_MEM |
+                                        READS_TO_LOCAL_S_MEM |
+ NON_TEMP_WRITE_TO_LOCAL_MEM);
          r->mon.mbm_assign_on_mkdir = true;
          resctrl_file_fflags_init("num_mbm_cntrs",
                       RFTYPE_MON_INFO | RFTYPE_RES_CACHE);





I can send the official patch if you are ok to go ahead with the patch.

Let me know if I can add Signoff from you or you can respond after it is reviewed.





Thanks for the quick patch.

- Babu