Re: [PATCH v3 17/19] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu

From: Ilpo Järvinen
Date: Tue Mar 21 2023 - 11:25:52 EST


On Tue, 21 Mar 2023, Ilpo Järvinen wrote:

> On Mon, 20 Mar 2023, James Morse wrote:
>
> > When a CPU is taken offline resctrl may need to move the overflow or
> > limbo handlers to run on a different CPU.
> >
> > Once the offline callbacks have been split, cqm_setup_limbo_handler()
> > will be called while the CPU that is going offline is still present
> > in the cpu_mask.
> >
> > Pass the CPU to exclude to cqm_setup_limbo_handler() and
> > mbm_setup_overflow_handler(). These functions can use a variant of
> > cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
> > need excluding.
> >
> > Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxx>
> > Signed-off-by: James Morse <james.morse@xxxxxxx>
> > ---
> > Changes since v2:
> > * Rephrased a comment to avoid a two letter bad-word. (we)
> > * Avoid assigning mbm_work_cpu if the domain is going to be free()d
> > * Added cpumask_any_housekeeping_but(), I dislike the name
> > ---
> > arch/x86/kernel/cpu/resctrl/core.c | 8 +++--
> > arch/x86/kernel/cpu/resctrl/internal.h | 37 ++++++++++++++++++++--
> > arch/x86/kernel/cpu/resctrl/monitor.c | 43 +++++++++++++++++++++-----
> > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++--
> > include/linux/resctrl.h | 3 ++
> > 5 files changed, 83 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 8e25ea49372e..aafe4b74587c 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -582,12 +582,16 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> > if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
> > if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> > cancel_delayed_work(&d->mbm_over);
> > - mbm_setup_overflow_handler(d, 0);
> > + /*
> > + * exclude_cpu=-1 as this CPU has already been removed
> > + * by cpumask_clear_cpu()d
> > + */
> > + mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
> > }
> > if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
> > has_busy_rmid(r, d)) {
> > cancel_delayed_work(&d->cqm_limbo);
> > - cqm_setup_limbo_handler(d, 0);
> > + cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
> > }
> > }
> > }
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 3eb5b307b809..47838ba6876e 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -78,6 +78,37 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
> > return cpu;
> > }
> >
> > +/**
> > + * cpumask_any_housekeeping_but() - Chose any cpu in @mask, preferring those
> > + * that aren't marked nohz_full, excluding
> > + * the provided CPU
> > + * @mask: The mask to pick a CPU from.
> > + * @exclude_cpu:The CPU to avoid picking.
> > + *
> > + * Returns a CPU from @mask, but not @but. If there are houskeeping CPUs that
> > + * don't use nohz_full, these are preferred.
> > + * Returns >= nr_cpu_ids if no CPUs are available.
> > + */
> > +static inline unsigned int
> > +cpumask_any_housekeeping_but(const struct cpumask *mask, int exclude_cpu)
> > +{
> > + int cpu, hk_cpu;
> > +
> > + cpu = cpumask_any_but(mask, exclude_cpu);
> > + if (tick_nohz_full_cpu(cpu)) {
> > + hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
> > + if (hk_cpu == exclude_cpu) {
> > + hk_cpu = cpumask_nth_andnot(1, mask,
> > + tick_nohz_full_mask);
>
> I'm left to wonder if it's okay to alter tick_nohz_full_mask in resctrl
> code??

I suppose it should do instead:
hk_cpu = cpumask_nth_and(0, mask, tick_nohz_full_mask);
if (hk_cpu == exclude_cpu)
hk_cpu = cpumask_next_and(hk_cpu, mask, tick_nohz_full_mask);

--
i.