OLS BoF on Resource Management - some notes

From: Srivatsa Vaddagiri
Date: Wed Jul 26 2006 - 14:28:55 EST

Hello everyone,
The recent (ongoing) work on containers has brought a lot of focus to
resource management capabilities of Linux kernel. Workload management tools
also need the capability to manage workloads better. Both users have felt the
need for better resource management capabilities of Linux kernel than what
exists today. Nick (at OLS) suggested that some minimal requirements be worked
out that can satisfy some needs of both parties.

An impromptu BoF was hence arranged by Dipankar Sarma to discuss this at OLS.
The BoF was attended by several folks (most of folks on CC list), including
those from ckrm and openvz teams. Shailabh Nagar (of ckrm team) and Kirill
Korotaev (of openvz team) presented their views/requirements of resource
management from workload manager and container perspectives respectively.

I did some note-taking during the BoF, which is shared below. I sincerely
apologize if there are some errors in the notes or if I have missed to note
other important points that were discussed.

Broadly these were the points discussed. I dont think these reflect any final
decisions made on the interface/requirements but more of a common ground to
begin with.

A - Task grouping
Ability to apply resource control over a group tasks was felt necessary.

It was suggested that the kernel maintain some generic "task group"
structure, to represent a group of tasks, which can be used for
various purposes - resource management, security, containers (?) etc.

- Heirarchial grouping?
Should we support a heirarchy of groups? This can be supported only
if various controllers recognize/support the heirarchy too, which will
complicate them.

It was suggested that we dont support this initially.

- Dynamic group membership
Should we allow movement of tasks willingly from one group to other?

Some implications of this requirement in memory controller (transfer of
page-ownership) were discussed. Also questions like "how fast should
this change be visible" were raised.

It was felt that this be not allowed in the 1st pass i.e once a
task is bound to a group, it is struck to that group till it exits.

- Controller-specific grouping?
Should the task-group definition be different for different controllers?

For example, should we allow container A to belong to two different
groups, G1 and G2, simultaneously, where G1 governs its CPU usage and
G2 governs its memory usage?

It was felt that we dont attempt to do this initially.

- Interface to manage (create/delete) groups:
Should it be system-call or file-system based?

This was not discussed and was felt to be a non-issue. Maybe we could
start with system call to begin with (like what the openvz people have).

B - Resource limit and/or guarantee
Should the resource control be specified in terms of guarantee or
(soft/hard) limit?

It was felt that we attempt to provide limit functionality only in
the 1st pass. Soft or hard limit can be left to the discreetion of

- Absolute or relative limit?
Should the limit be specified in absolute terms (limit group G1's
memory usage to 20%) or relative terms (group G1's limit is half
that of G2)?

It was felt again that this be left to the discreetion of the
controller. Controller should export this information and also
the unit of measuring this limit.

- Resource control epoch:
Over what interval should the resource control be applied?

This was again felt should be left to the discreetion of the controller.

C - Resources to manage
Openvz cares for almost everything (cpu, memory, disk i/o, network,
open files, etc)!

Some suggested that a simple controller like number-of-tasks be posted
to illustrate the grouping and infrastructure. Others felt that
we need to go for "real" controllers like memory and cpu. The design
of memory controller received lot of interest and it was felt that
we attempt to do memory controller first.

In summary, the consensus seemed to be:

- Focus on controllers more (preferably memory first). Keep them
simple to begin with, willing to sacrifice accuracy to an
extent where necessary.

- Choose some minimal interface for starters (say a syscall to
start new group and another one to set/change limit?).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/