Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

From: Robert Foss
Date: Thu Aug 18 2016 - 21:51:19 EST




On 2016-08-18 02:01 PM, Michal Hocko wrote:
On Thu 18-08-16 10:47:57, Sonny Rao wrote:
On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
On Wed 17-08-16 11:57:56, Sonny Rao wrote:
[...]
2) User space OOM handling -- we'd rather do a more graceful shutdown
than let the kernel's OOM killer activate and need to gather this
information and we'd like to be able to get this information to make
the decision much faster than 400ms

Global OOM handling in userspace is really dubious if you ask me. I
understand you want something better than SIGKILL and in fact this is
already possible with memory cgroup controller (btw. memcg will give
you a cheap access to rss, amount of shared, swapped out memory as
well). Anyway if you are getting close to the OOM your system will most
probably be really busy and chances are that also reading your new file
will take much more time. I am also not quite sure how is pss useful for
oom decisions.

I mentioned it before, but based on experience RSS just isn't good
enough -- there's too much sharing going on in our use case to make
the correct decision based on RSS. If RSS were good enough, simply
put, this patch wouldn't exist.

But that doesn't answer my question, I am afraid. So how exactly do you
use pss for oom decisions?

So even with memcg I think we'd have the same problem?

memcg will give you instant anon, shared counters for all processes in
the memcg.

Is it technically feasible to add instant pss support to memcg?

@Sonny Rao: Would using cgroups be acceptable for chromiumos?


Don't take me wrong, /proc/<pid>/totmaps might be suitable for your
specific usecase but so far I haven't heard any sound argument for it to
be generally usable. It is true that smaps is unnecessarily costly but
at least I can see some room for improvements. A simple patch I've
posted cut the formatting overhead by 7%. Maybe we can do more.

It seems like a general problem that if you want these values the
existing kernel interface can be very expensive, so it would be
generally usable by any application which wants a per process PSS,
private data, dirty data or swap value.

yes this is really unfortunate. And if at all possible we should address
that. Precise values require the expensive rmap walk. We can introduce
some caching to help that. But so far it seems the biggest overhead is
to simply format the output and that should be addressed before any new
proc file is added.

I mentioned two use cases, but I guess I don't understand the comment
about why it's not usable by other use cases.

I might be wrong here but a use of pss is quite limited and I do not
remember anybody asking for large optimizations in that area. I still do
not understand your use cases properly so I am quite skeptical about a
general usefulness of a new file.