Re: [PATCH v2 09/18] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep

From: Peter Newman
Date: Wed Mar 22 2023 - 09:22:01 EST


Hi James,

On Mon, Mar 20, 2023 at 6:12 PM James Morse <james.morse@xxxxxxx> wrote:
> On 10/03/2023 09:28, Peter Newman wrote:
> > In the interest of enabling MPAM functionality, I think the low-effort
> > approach is to only allocate an MBWU monitor to a newly-created MON or
> > CTRL_MON group if one is available. On Intel and AMD, the resources are
> > simply always available.
>
> I agree its low-effort, but I think the result is not worth having.
>
> What does user-space get when it reads 'mbm_total_bytes'? Returning an error here sucks.
> How is user-space supposed to identify the groups it wants to monitor, and those it
> doesn't care about?
>
> Taking "the only way to win is not to play" means the MPAM driver will only offer those
> 'mbm_total_bytes' files if they are going to work in the same way they do today. (as you
> said, on Intel and AMD the resources are simply always available).

I told you that only Intel so far has resources for all RMIDs. AMD
implementations allocate MBW monitors on demand, even reallocating ones
that are actively in use.

> I agree those files have always been able to return errors - but I've never managed to
> make the Intel system I have do it... so I bet user-space doesn't expect errors here.
> (let alone persistent errors)

Find some AMD hardware. It's very easy to get persistent errors due to
no counters being allocated for an RMID:

(this is an 'AMD Ryzen Threadripper PRO 3995WX 64-Cores')

# cd /sys/fs/resctrl/mon_groups
# mkdir test
# cat test/mon_data/*/mbm_total_bytes
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
# cat test/mon_data/*/mbm_total_bytes
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable


> This patch to allow resctrl_arch_rmid_read() to sleep is about MPAM's CSU NRDY and the
> high likelyhood that folk build systems where MSCs are sliced up and private to something
> smaller than the resctrl:domain. Without the perf support, this would still be necessary.

I was worried about the blocking more when I thought you were doing it
for MBWU monitoring. Serializing access to limited CSU monitors makes
more sense.

> The changes needed for perf support are to make resctrl_arch_rmid_read() re-entrant, and
> for the domain list to be protected by RCU. Neither of these are as onerous as changes to
> the user-space interface, and the associated risk of breaking programs that work on other
> platforms.

I went ahead and tried to rebase my reliable-MBM-on-AMD changes onto
your series and they seemed to work with less difficulty than I was
expecting, so I'll try to stop worrying about the churn of this series
now.

-Peter