Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications

From: Anton Vorontsov
Date: Wed Nov 07 2012 - 06:46:40 EST


On Wed, Nov 07, 2012 at 01:21:36PM +0200, Kirill A. Shutemov wrote:
[...]
> Sorry, I didn't follow previous discussion on this, but could you
> explain what's wrong with memory notifications from memcg?
> As I can see you can get pretty similar functionality using memory
> thresholds on the root cgroup. What's the point?

There are a few reasons we don't use cgroup notifications:

1. We're not interested in the absolute number of pages/KB of available
memory, as provided by cgroup memory controller. What we're interested
in is the amount of easily reclaimable memory and new memory
allocations' cost.

We can have plenty of "free" memory, of which say 90% will be caches,
and say 10% idle. But we do want to differentiate these types of memory
(although not going into details about it), i.e. we want to get
notified when kernel is reclaiming. And we also want to know when the
memory comes from swapping others' pages out (well, actually we don't
call it swap, it's "new allocations cost becomes high" -- it might be a
result of many factors (swapping, fragmentation, etc.) -- and userland
might analyze the situation when this happens).

Exposing all the VM details to userland is not an option -- it is not
possible to build a stable ABI on this. Plus, it makes it really hard
for userland to deal with all the low level details of Linux VM
internals.

So, no, raw numbers of "free/used KBs" are not interesting at all.

1.5. But it is important to understand that vmpressure_fd() is not
orthogonal to cgroups (like it was with vmevent_fd()). We want it to
be "cgroup'able" too. :) But optionally.

2. The last time I checked, cgroups memory controller did not (and I guess
still does not) not account kernel-owned slabs. I asked several times
why so, but nobody answered.

But no, this is not the main issue -- per "1.", we're not interested in
kilobytes.

3. Some folks don't like cgroups: it has a penalty for kernel size, for
performance and memory wastage. But again, it's not the main issue with
memcg.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/