Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: David Carrillo-Cisneros
Date: Fri Feb 03 2017 - 16:08:18 EST


On Fri, Feb 3, 2017 at 9:52 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> On Thu, Feb 02, 2017 at 06:14:05PM -0800, David Carrillo-Cisneros wrote:
>> If we tie allocation groups and monitoring groups, we are tying the
>> meaning of CPUs and we'll have to choose between the CAT meaning or
>> the perf meaning.
>>
>> Let's allow semantics that will allow perf like monitoring to
>> eventually work, even if its not immediately supported.
>
> Would it work to make monitor groups be "task list only" or "cpu mask only"
> (unlike control groups that allow mixing).

That works, but please don't use chmod. Make it explicit by the group
position (i.e. mon/cpus/grpCPU1, mon/tasks/grpTasks1).

>
> Then the intel_rdt_sched_in() code could pick the RMID in ways that
> give you the perf(1) meaning. I.e. if you create a monitor group and assign
> some CPUs to it, then we will always load the RMID for that monitor group
> when running on those cpus, regardless of what group(s) the current process
> belongs to. But if you didn't create any cpu-only monitor groups, then we'd
> assign RMID using same rules as CLOSID (so measurements from a control group
> would track allocation policies).

I think that's very confusing for the user. A group's observed
behavior should be determined by its attributes and not change
depending on how other groups are configured. Think on multiple users
monitoring simultaneously.

>
> We are already planning that creating monitor only groups will change
> what is reported in the control group (e.g. you pull some tasks out of
> the control group to monitor them separately, so the control group only
> reports the tasks that you didn't move out for monitoring).

That's also confusing, and the work-around that Vikas proposed of two
separate files to enumerate tasks (one for control and one for
monitoring) breaks the concept of a task group.





>From our discussions, we can support the use cases we care about
without weird-corner cases, by having:
- A set of allocation group as stand now. Either use the current
resctrl, or rename it to something like resdir/ctrl (before v4.10
sails).
- A set of monitoring task groups. Either in a "tasks" folder in a
resmon fs or in resdir/mon/tasks.
- A set of monitoring CPU groups. Either in a "cpus" folder in a
resmon fs or in resdir/mon/cpus.

So when a user measures a group (shown using the -G option, it could
as well be the -R Vikas wants):

1) perf stat -e llc_occupancy -G resdir/ctrl/g1
measures the CAT allocation group as if RMIDs were managed like CLOSIDs.

2) perf stat -e llc_occupancy -G resdir/mon/tasks/g1
measures the combined occupancy of all tasks in g1 (like a cgroups in
present perf).

3) perf stat -e llc_occupancy -C <some id of resdir/mon/cpus/g1>
*XOR* perf stat -e llc_occupancy -G resdir/mon/cpus/g1
measures the combined occupancy of all tasks while ran in any CPU in
g1 (perf-like filtering, not the CAT way).

I know the present implementation scope is limited, so you could:
- support 1) and/or 2) only
- do a simple RMID management (e.g. same RMID all packages, allocate
RMID on creation or fail)
- do the custom fs based tool that Vikas mentioned instead of using
perf_event_open (if it's somehow easier to build and maintain a new
tool rather than reuse perf(1) ).

any or all of the above are fine. But please don't choose group
semantics that will prevent us from eventually supporting full
perf-like behavior or that we already know explode in user's face.

Thanks,
David