Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: Shivappa Vikas
Date: Fri Jan 20 2017 - 15:43:28 EST




On Thu, 19 Jan 2017, David Carrillo-Cisneros wrote:

On Thu, Jan 19, 2017 at 6:32 PM, Vikas Shivappa
<vikas.shivappa@xxxxxxxxxxxxxxx> wrote:
Resending including Thomas , also with some changes. Sorry for the spam

Based on Thomas and Peterz feedback Can think of two design
variants which target:

-Support monitoring and allocating using the same resctrl group.
user can use a resctrl group to allocate resources and also monitor
them (with respect to tasks or cpu)

-Also allows monitoring outside of resctrl so that user can
monitor subgroups who use the same closid. This mode can be used
when user wants to monitor more than just the resctrl groups.

The first design version uses and modifies perf_cgroup, second version
builds a new interface resmon.

The second version would require to build a whole new set of tools,
deploy them and maintain them. Users will have to run perf for certain
events and resmon (or whatever is named the new tool) for rdt. I see
it as too complex and much prefer to keep using perf.

This was so that we have the flexibility to align the tools as per the requirement of the feature rather than twisting the perf behaviour and also have that flexibility for future when new RDT features are added (something similar to what we did by introducing resctrl groups instead of using cgroups for CAT)

Sometimes thats a lot simpler as we dont need a lot code given the limited/specific syscalls we need to support. Just like the resctrl fs which is specific to RDT.

It looks like your requirement is to be able to monitor a group of tasks independently apart from the resctrl groups?

The task option should provide that flexibility to monitor a bunch of tasks independently apart from whether they are part of resctrl group or not. The assignment of RMID is contolled underneat by the kernel so we can optimize the usage of RMIDs and also RMIDs are tied to this group of tasks whether its a subset of resctrl group or not.


The first version is close to the patches
sent with some additions/changes. This includes details of the design as
per Thomas/Peterz feedback.

1> First Design option: without modifying the resctrl and using perf
--------------------------------------------------------------------
--------------------------------------------------------------------

In this design everything in resctrl interface works like
before (the info, resource group files like task schemata all remain the
same)


Monitor cqm using perf
----------------------

perf can monitor individual tasks using the -t
option just like before.

# perf stat -e llc_occupancy -t PID1,PID2

user can monitor the cpu occupancy using the -C option in perf:

# perf stat -e llc_occupancy -C 5

Below shows how user can monitor cgroup occupancy:

# mount -t cgroup -o perf_event perf_event /sys/fs/cgroup/perf_event/
# mkdir /sys/fs/cgroup/perf_event/g1
# mkdir /sys/fs/cgroup/perf_event/g2
# echo PID1 > /sys/fs/cgroup/perf_event/g2/tasks

# perf stat -e intel_cqm/llc_occupancy/ -a -G g2

To monitor a resctrl group, user can group the same tasks in resctrl
group into the cgroup.

To monitor the tasks in p1 in example 2 below, add the tasks in resctrl
group p1 to cgroup g1

# echo 5678 > /sys/fs/cgroup/perf_event/g1/tasks

Introducing a new option for resctrl may complicate monitoring because
supporting cgroup 'task groups' and resctrl 'task groups' leads to
situations where:
if the groups intersect, then there is no way to know what
l3_allocations contribute to which group.

ex:
p1 has tasks t1, t2, t3
g1 has tasks t2, t3, t4

The only way to get occupancy for g1 and p1 would be to allocate an RMID
for each task which can as well be done with the -t option.

That's simply recreating the resctrl group as a cgroup.

I think that the main advantage of doing allocation first is that we
could use the context switch in rdt allocation and greatly simplify
the pmu side of it.

If resctrl groups could lift the restriction of one resctl per CLOSID,
then the user can create many resctrl in the way perf cgroups are
created now. The advantage is that there wont be cgroup hierarchy!
making things much simpler. Also no need to optimize perf event
context switch to make llc_occupancy work.

Then we only need a way to express that monitoring must happen in a
resctl to the perf_event_open syscall.

My first thought is to have a "rdt_monitor" file per resctl group. A
user passes it to perf_event_open in the way cgroups are passed now.
We could extend the meaning of the flag PERF_FLAG_PID_CGROUP to also
cover rdt_monitor files. The syscall can figure if it's a cgroup or a
rdt_group. The rdt_monitoring PMU would only work with rdt_monitor
groups

Then the rdm_monitoring PMU will be pretty dumb, having neither task
nor CPU contexts. Just providing the pmu->read and pmu->event_init
functions.

Task monitoring can be done with resctrl as well by adding the PID to
a new resctl and opening the event on it. And, since we'd allow CLOSID
to be shared between resctrl groups, allocation wouldn't break.

It looks like we are trying to create a MONGRP and CTRLGRP like Thomas mentions.

Although resctrl group now does not have a hierarchy a task can be part of only one group - breaking this is just equivalent to having a seperate resmon group which may group the tasks independent of how they are grouped in the resctrl group?

That can be achieved as well with the option to monitor at task granularity ? that means if we support task option and the option to monitor resctrl groups we obtain the same functionality.