Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: Shivappa Vikas
Date: Wed Jan 18 2017 - 14:42:25 EST




On Wed, 18 Jan 2017, Thomas Gleixner wrote:

On Tue, 17 Jan 2017, Shivappa Vikas wrote:
On Tue, 17 Jan 2017, Thomas Gleixner wrote:
On Fri, 6 Jan 2017, Vikas Shivappa wrote:
- Issue(1): Inaccurate data for per package data, systemwide. Just prints
zeros or arbitrary numbers.

Fix: Patches fix this by just throwing an error if the mode is not
supported.
The modes supported is task monitoring and cgroup monitoring.
Also the per package
data for say socket x is returned with the -C <cpu on socketx> -G cgrpy
option.
The systemwide data can be looked up by monitoring root cgroup.

Fine. That just lacks any comment in the implementation. Otherwise I would
not have asked the question about cpu monitoring. Though I fundamentaly
hate the idea of requiring cgroups for this to work.

If I just want to look at CPU X why on earth do I have to set up all that
cgroup muck? Just because your main focus is cgroups?

The upstream per cpu data is broken because its not overriding the other task
event RMIDs on that cpu with the cpu event RMID.

Can be fixed by adding a percpu struct to hold the RMID thats affinitized
to the cpu, however then we miss all the task llc_occupancy in that - still
evaluating it.

The point here is that CQM is closely connected to the cache allocation
technology. After a lengthy discussion we ended up having

- per cpu CLOSID
- per task CLOSID

where all tasks which do not have a CLOSID assigned use the CLOSID which is
assigned to the CPU they are running on.

So if I configure a system by simply partitioning the cache per cpu, which
is the proper way to do it for HPC and RT usecases where workloads are
partitioned on CPUs as well, then I really want to have an equaly simple
way to monitor the occupancy for that reservation.

And looking at that from the CAT point of view, which is the proper way to
do it, makes it obvious that CQM should be modeled to match CAT.

Ok , makes sense. Tony and Fenghua had suggested some ideas to model the two more close together. Let me do some more brainstorming and try to come up with a draft that can be discussed.


So lets assume the following:

CPU 0-3 default CLOSID 0
CPU 4 CLOSID 1
CPU 5 CLOSID 2
CPU 6 CLOSID 3
CPU 7 CLOSID 3

T1 CLOSID 4
T2 CLOSID 5
T3 CLOSID 6
T4 CLOSID 6

All other tasks use the per cpu defaults, i.e. the CLOSID of the CPU
they run on.

then the obvious basic monitoring requirement is to have a RMID for each
CLOSID.

So when I monitor CPU4, i.e. CLOSID 1 and T1 runs on CPU4, then I do not
care at all about the occupancy of T1 simply because that is running on a
seperate reservation.

Ok, then we can give the cpu monitoring a priority just like CAT.

Trying to make that an aggregated value in the first
place is completely wrong. If you want an aggregate, which is pretty much
useless, then user space tools can generate it easily.

The whole approach you and David have taken is to whack some desired cgroup
functionality and whatever into CQM without rethinking the overall
design. And that's fundamentaly broken because it does not take cache (and
memory bandwidth) allocation into account.

I seriously doubt, that the existing CQM/MBM code can be refactored in any
useful way. As Peter Zijlstra said before: Remove the existing cruft
completely and start with completely new design from scratch.

I missed Peterz indicated new design from scratch. Was only bothered with the implementations given that CAt was still going on. Since CAT is up now we may be able to do better.

Thanks,
Vikas


And this new design should start from the allocation angle and then add the
whole other muck on top so far its possible. Allocation related monitoring
must be the primary focus, everything else is just tinkering.

Thanks,

tglx