Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

From: Vikas Shivappa
Date: Thu Aug 06 2015 - 16:58:41 EST

Next message: Al Stone: "[PATCH 0/5] ACPI: Provide better MADT subtable sanity checks"
Previous message: Kamal Mostafa: "[PATCH 3.13.y-ckt 05/53] x86/nmi/64: Switch stacks on userspace NMI entry"
In reply to: Tejun Heo: "Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management"
Next in thread: Tejun Heo: "Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 5 Aug 2015, Tejun Heo wrote:

Hello,

On Tue, Aug 04, 2015 at 07:21:52PM -0700, Vikas Shivappa wrote:

I get that this would be an easier "bolt-on" solution but isn't a good
solution by itself in the long term. As I wrote multiple times
before, this is a really bad programmable interface. Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications,

Yes, this doesnt have to be a programmable interface for threads. May not be
a good idea to let the threads decide the cache allocation by themselves
using this direct interface. We are transfering the decision maker
responsibility to the system administrator.

I'm having hard time believing that. There definitely are use cases
where cachelines are trashed among service threads. Are you
proclaiming that those cases aren't gonna be supported?

Please refer to the noisy neighbour example i give here to help resolve thrashing by a noisy neighbour -
http://marc.info/?l=linux-kernel&m=143889397419199

and the reference
http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html

- This interface like you said can easily bolt-on. basically an easy to use
interface without worrying about the architectural details.

But it's ripe with architectural details.

If specifying the bitmask is an issue , it can easily be addressed by writing a script which calculates the bitmask to size - like mentioned here
http://marc.info/?l=linux-kernel&m=143889397419199

What I meant by bolt-on was

that this is a shortcut way of introducing this feature without
actually worrying about how this will be used by applications and
that's not a good thing. We need to be worrying about that.

- But still does the job. root user can allocate exclusive or overlapping
cache lines to threads or group of threads.
- No major roadblocks for usage as we can make the allocations like
mentioned above and still keep the hierarchy etc and use it when needed.
- An important factor is that it can co-exist with other interfaces like #2
and #3 for the same easily. So I donot see a reason why we should not use
this.
This is not meant to be a programmable interface, however it does not
prevent co-existence.

I'm not saying they are mutually exclusive but that we're going
overboard in this direction when programmable interface should be the
priority. While this mostly happened naturally for other resources
because cgroups was introduced later but I think there's a general
rule to follow there.

Right , the cache allocation cannot be treated like memory like explained here in 1.3 and 1.4
http://marc.info/?l=linux-kernel&m=143889397419199

- If root user has to set affinity of threads that he is allocating cache,
he can do so using other cgroups like cpuset or set the masks seperately
using taskset. This would let him configure the cache allocation on a
socket.

Well, root can do whatever it wants with programmable interface too.
The way things are designed, even containment isn't an issue, assign
an ID to all processes by default and change the allocation on that.

this is a pretty bad interface by itself.

There is already a lot of such usage among different enterprise users at
Intel/google/cisco etc who have been testing the patches posted to lkml and
academically there is plenty of usage as well.

I mean, that's the tool you gave them. Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too. Again, please think of cpu affinity.

All the methodology to support the feature may need an arbitrator/agent to
decide the allocation.

1. Let the root user or system administrator be the one who decides the
allocation based on the current usage. We assume this to be one with
administrative privileges. He could use the cgroup interface to perform the
task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup
together.

If you factor in threads of a process, the above model is
fundamentally flawed. How would root or any external entity find out
what threads are to be allocated what?

the process ID can be added to the cgroup together with all its threads as shown in example of cgroup usage in (2) here -

In most cases in the cloud you will be able to decide based on what workloads are running - see the example 1.5 here

http://marc.info/?l=linux-kernel&m=143889397419199

Each application would

constnatly have to tell an external agent about what its intentions
are. This might seem to work in a limited feature testing setup where
you know everything about who's doing what but is no way a widely
deployable solution. This pretty much degenerates into #3 you listed
below.

App may not be the best one to decide 1.1 and 1.2 here
http://marc.info/?l=linux-kernel&m=143889397419199

2. Kernel automatically assigning the cache based on the priority of the apps
etc. This is something which could be designed to co-exist with the #1 above
much like how the cpusets cgroup co-exist with the kernel assigning cpus to
tasks. (the task could be having a cache capacity mask just like the cpu
affinity mask)

I don't think CAT would be applicable in this manner. BE allocation
is what the CPU is doing by default already. I'm highly doubtful
something like CAT would be used automatically in generic systems. It
requires fairly specific coordination after all.

The 3 items were generalized at high level to show the system management vs user doing it or . I am not saying it should be done this way -

Thanks,
Vikas

3. User programmable interface , where say a resource management program
x (and hence apps) could link a library which supports cache alloc/monitoring
etc and then try to control and monitor the resources. The arbitrator could just
be the resource management interface itself or the kernel could decide.

If users use this programmable interface, we need to make sure all the apps
just cannot allocate resources without some interfacing agent (in which case
they could interface with #2 ?).

Do you think there are any issues for the user programmable interface to
co-exist with the cgroup interface ?

Isn't that a weird question to ask when there's no reason to rush to a
full-on cgroup controller?

We can start with something simpler and

more specific and easier for applications to program against. If the
hardware details make it difficult to design properly abstracted
interface around, make it a char device node, for example, and let
userland worry about how to control access to it. If you stick to
something like that, exposing most of hardware details verbatim is
fine. People know they're dealing with something very specific with
those types of interfaces.

Thanks.

--
tejun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Al Stone: "[PATCH 0/5] ACPI: Provide better MADT subtable sanity checks"
Previous message: Kamal Mostafa: "[PATCH 3.13.y-ckt 05/53] x86/nmi/64: Switch stacks on userspace NMI entry"
In reply to: Tejun Heo: "Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management"
Next in thread: Tejun Heo: "Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]