Re: [PATCH v4 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

From: Reinette Chatre
Date: Tue May 17 2022 - 12:19:05 EST


Hi James,

On 4/12/2022 5:44 AM, James Morse wrote:

...

> @@ -3263,6 +3295,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> cancel_delayed_work(&d->cqm_limbo);
> }
>
> + mba_sc_domain_destroy(r, d);
> domain_destroy_mon_state(d);
> }

It is not clear to me how rdt_domain->mbps_val will be released via the above call.

After patch 3/21 and the hunk below resctrl_online_domain() would look like:

resctrl_online_domain() {

int err;

lockdep_assert_held(&rdtgroup_mutex);

if (is_mbm_enabled() && r->rid == RDT_RESOURCE_MBA) {
err = mba_sc_domain_allocate(r, d);
if (err)
return err;
}

if (!r->mon_capable)
return 0;

...
}

If I understand the above correctly, if MBM is enabled then all domains
of resource RDT_RESOURCE_MBA will have rdt_domain->mbps_val allocated via
resctrl_online_domain().

RDT_RESOURCE_MBA is not mon_capable, so at the time its domains go
offline, the freeing of rdt_domain->mbps_val will be skipped because
after patch 5/21 resctrl_offline_domain() would look like below so
I do not see how the hunk added above will ever end up cleaning up
allocated memory:

resctrl_offline_domain() {

lockdep_assert_held(&rdtgroup_mutex);

if (!r->mon_capable) /* RDT_RESOURCE_MBA is not mon_capable */
return 0;

...


mba_sc_domain_destroy(r, d); /* Not reached for rdt_domains of RDT_RESOURCE_MBA */
domain_destroy_mon_state(d);
}

>
> @@ -3302,12 +3335,20 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>
> lockdep_assert_held(&rdtgroup_mutex);
>
> + if (is_mbm_enabled() && r->rid == RDT_RESOURCE_MBA) {

This introduces only half of the checks that are later replaced in
patch 10 "x86/resctrl: Abstract and use supports_mba_mbps()". Could the
full check be used here for that patch to be cleaner or perhaps patch 10
could be moved to be before this patch?

> + err = mba_sc_domain_allocate(r, d);
> + if (err)
> + return err;
> + }
> +
> if (!r->mon_capable)
> return 0;
>
> err = domain_setup_mon_state(r, d);
> - if (err)
> + if (err) {
> + mba_sc_domain_destroy(r, d);
> return err;
> + }

Cleaning up after the error is reasonable but this allocation would only
ever happen if the resource is RDT_RESOURCE_MBA and it is not mon_capable.
Something would thus have gone really wrong if this cleanup is necessary.
Considering that only mon_capable resources are initialized at this point,
why not just exit right after calling mba_sc_domain_allocate()?


>
> if (is_mbm_enabled()) {
> INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 5d283bdd6162..46ab9fb5562e 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,
>
> #endif
>
> +/* max value for struct rdt_domain's mbps_val */
> +#define MBA_MAX_MBPS U32_MAX
> +
> /**
> * enum resctrl_conf_type - The type of configuration.
> * @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
> @@ -53,6 +56,9 @@ struct resctrl_staged_config {
> * @cqm_work_cpu: worker CPU for CQM h/w counters
> * @plr: pseudo-locked region (if any) associated with domain
> * @staged_config: parsed configuration to be applied
> + * @mbps_val: When mba_sc is enabled, this holds the array of user
> + * specified control values for mba_sc in MBps, indexed
> + * by closid
> */
> struct rdt_domain {
> struct list_head list;
> @@ -67,6 +73,7 @@ struct rdt_domain {
> int cqm_work_cpu;
> struct pseudo_lock_region *plr;
> struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
> + u32 *mbps_val;
> };
>
> /**

Reinette