Re: [PATCH v4] vmevent: Implement greater-than attribute state andone-shot mode

From: KOSAKI Motohiro
Date: Tue May 01 2012 - 21:21:19 EST


(5/1/12 8:20 PM), Anton Vorontsov wrote:
Hello Rik,

Thanks for looking into this!

On Tue, May 01, 2012 at 05:04:21PM -0400, Rik van Riel wrote:
On 05/01/2012 09:18 AM, Anton Vorontsov wrote:
This patch implements a new event type, it will trigger whenever a
value becomes greater than user-specified threshold, it complements
the 'less-then' trigger type.

Also, let's implement the one-shot mode for the events, when set,
userspace will only receive one notification per crossing the
boundaries.

Now when both LT and GT are set on the same level, the event type
works as a cross event type: it triggers whenever a value crosses
the threshold from a lesser values side to a greater values side,
and vice versa.

We use the event types in an userspace low-memory killer: we get a
notification when memory becomes low, so we start freeing memory by
killing unneeded processes, and we get notification when memory hits
the threshold from another side, so we know that we freed enough of
memory.

How are these vmevents supposed to work with cgroups?

Currently these are independent subsystems, if you have memcg enabled,
you can do almost anything* with the memory, as memg has all the needed
hooks in the mm/ subsystem (it is more like "memory management tracer"
nowadays :-).

But cgroups have its cost, both performance penalty and memory wastage.
For example, in the best case, memcg constantly consumes 0.5% of RAM to
track memory usage, this is 5 MB on a 1 GB "embedded" machine. To some
people it feels just wrong to waste that memory for mere notifications.

Of course, this alone can be considered as a lame argument for making
another subsystem (instead of "fixing" the current one). But see below,
vmevent is just a convenient ABI.

What do we do when a cgroup nears its limit, and there
is no more swap space available?

What do we do when a cgroup nears its limit, and there
is swap space available?

As of now, this is all orthogonal to vmevent. Vmevent doesn't know
about cgroups. If kernel has the memcg enabled, one should probably*
go with it (or better, with its ABI). At least for now.

It would be nice to be able to share the same code for
embedded, desktop and server workloads...

It would be great indeed, but so far I don't see much that
vmevent could share. Plus, sharing the code at this point is not
that interesting; it's mere 500 lines of code (comparing to
more than 10K lines for cgroups, and it's not including memcg_
hooks and logic that is spread all over mm/).

Today vmevent code is mostly an ABI implementation, there is
very little memory management logic (in contrast to the memcg).

But, if it doesn't work desktop/server area, it shouldn't be merged.
We have to consider the best design before kernel inclusion. They cann't
be separeted to discuss.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/