Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: Stephane Eranian
Date: Wed Feb 08 2017 - 16:37:58 EST


Tony,

On Tue, Feb 7, 2017 at 10:52 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> On Tue, Feb 07, 2017 at 12:08:09AM -0800, Stephane Eranian wrote:
>> Hi,
>>
>> I wanted to take a few steps back and look at the overall goals for
>> cache monitoring.
>> From the various threads and discussion, my understanding is as follows.
>>
>> I think the design must ensure that the following usage models can be monitored:
>> - the allocations in your CAT partitions
>> - the allocations from a task (inclusive of children tasks)
>> - the allocations from a group of tasks (inclusive of children tasks)
>> - the allocations from a CPU
>> - the allocations from a group of CPUs
>>
>> All cases but first one (CAT) are natural usage. So I want to describe
>> the CAT in more details.
>> The goal, as I understand it, it to monitor what is going on inside
>> the CAT partition to detect
>> whether it saturates or if it has room to "breathe". Let's take a
>> simple example.
>
> By "natural usage" you mean "like perf(1) provides for other events"?
>
Yes, people are used to monitoring events per task or per CPU. In that
sense, it is the common usage model. Cgroup monitoring is a derivative
of per-cpu mode.

> But we are trying to figure out requirements here ... what data do people
> need to manage caches and memory bandwidth. So from this perspective
> monitoring a CAT group is a natural first choice ... did we provision
> this group with too much, or too little cache.
>
I am not saying CAT is not natural. I am saying it is a justified requirement
but a new one and thus need to make sure it is understood and that the
kernel must track CAT partition and CAT partition cache occupancy monitoring
similarly.

> From that starting point I can see that a possible next step when
> finding that a CAT group has too small a cache is to drill down to
> find out how the tasks in the group are using cache. Armed with that
> information you could move tasks that hog too much cache (and are believed
> to be streaming through memory) into a different CAT group.
>
This is a valid usage model. But you have people who care about monitoring
occupancy but do not necessarily use CAT partitions. Yet in this case, the
occupancy data is still very useful to gauge cache footprint of a workload.
Therefore this usage model should not be discounted.

> What I'm not seeing is how drilling to CPUs helps you.
>
Looking for imbalance, for instance.
Are all the allocations done from only a subset of the CPUs?

> Say you have CPUs=CPU0,CPU1 in the CAT group and you collect data that
> shows that 75% of the cache occupancy is attributed to CPU0, and only
> 25% to CPU1. What can you do with this information to improve things?
> If it is deemed too complex (from a kernel code perspective) to
> implement per-CPU reporting how bad a loss would that be?
>
It is okay to first focus on per-task and per-CAT partition. What I'd
like to see is
an API that could possibly be extended later on to do per-CPU only mode. I am
okay with having only per-CAT and per-task groups initially to keep
things simpler.
But the rsrcfs interface should allow extension to per-CPU only mode. Then the
kernel implementation would take care of allocating the RMID accordingly. The
key is always to ensure allocations can be tracked since inception of the group
be it CAT, tasks, CPU.

> -Tony