Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

From: Tejun Heo
Date: Sun Aug 02 2015 - 12:23:38 EST


Hello,

On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
> > I don't really think it makes sense to implement a fully hierarchical
> > cgroup solution when there isn't the basic affinity-adjusting
> > interface
>
> What is an "affinity adjusting interface" ? Can you give an example
> please?

Something similar to sched_setaffinity(). Just a syscall / prctl or
whatever programmable interface which sets per-task attribute.

> > and it isn't clear whether fully hierarchical resource
> > distribution would be necessary especially given that the granularity
> > of the target resource is very coarse.
>
> As i see it, the benefit of the hierarchical structure to the CAT
> configuration is simply to organize sharing of cache ways in subtrees
> - two cgroups can share a given cache way only if they have a common
> parent.
>
> That is the only benefit. Vikas, please correct me if i'm wrong.

cgroups is not a superset of a programmable interface. It has
distinctive disadvantages and not a substitute with hirearchy support
for regular systemcall-like interface. I don't think it makes sense
to go full-on hierarchical cgroups when we don't have basic interface
which is likely to cover many use cases better. A syscall-like
interface combined with a tool similar to taskset would cover a lot in
a more accessible way.

> > I can see that how cpuset would seem to invite this sort of usage but
> > cpuset itself is more of an arbitrary outgrowth (regardless of
> > history) in terms of resource control and most things controlled by
> > cpuset already have countepart interface which is readily accessible
> > to the normal applications.
>
> I can't parse that phrase (due to ignorance). Please educate.

Hmmm... consider CPU affinity. cpuset definitely is useful for some
use cases as a management tool especially if the workloads are not
cooperative or delegated; however, it's no substitute for a proper
syscall interface and it'd be silly to try to replace that with
cpuset.

> > Given that what the feature allows is restricting usage rather than
> > granting anything exclusively, a programmable interface wouldn't need
> > to worry about complications around priviledges
>
> What complications about priviledges you refer to?

It's not granting exclusive access, so individual user applications
can be allowed to do whatever it wanna do as long as the issuer has
enough priv over the target task.

> > while being able to reap most of the benefits in an a lot easier way.
> > Am I missing something?
>
> The interface does allow for exclusive cache usage by an application.
> Please read the Intel manual, section 17, it is very instructive.

For that, it'd have to require some CAP but I think just having
restrictive interface in the style of CPU or NUMA affinity would go a
long way.

> The use cases we have now are the following:
>
> Scenario 1: Consider a system with 4 high performance applications
> running, one of which is a streaming application that manages a very
> large address space from which it reads and writes as it does its processing.
> As such the application will use all the cache it can get but does
> not need much if any cache. So, it spoils the cache for everyone for no
> gain on its own. In this case we'd like to constrain it to the
> smallest possible amount of cache while at the same time constraining
> the other 3 applications to stay out of this thrashed area of the
> cache.

A tool in the style of taskset should be enough for the above
scenario.

> Scenario 2: We have a numeric application that has been highly optimized
> to fit in the L2 cache (2M for example). We want to ensure that its
> cached data does not get flushed from the cache hierarchy while it is
> scheduled out. In this case we exclusively allocate enough L3 cache to
> hold all of the L2 cache.
>
> Scenario 3: Latency sensitive application executing in a shared
> environment, where memory to handle an event must be in L3 cache
> for latency requirements to be met.

Either isolate CPUs or run other stuff with affinity restricted.

cpuset-style allocation can be easier for things like this but that
should be an addition on top not the one and only interface. How is
it gonna handle if multiple threads of a process want to restrict
cache usages to avoid stepping on each other's toes? Delegate the
subdirectory and let the process itself open it and write to files to
configure when there isn't even a way to atomically access the
process's own directory or a way to synchronize against migration?
cgroups may be an okay management interface but a horrible
programmable interface.

Sure, if this turns out to be as important as cpu or numa affinity and
gets widely used creating management burden in many use cases, we sure
can add cgroups controller for it but that's a remote possibility at
this point and the current attempt is over-engineering solution for
problems which haven't been shown to exist. Let's please first
implement something simple and easy to use.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/