Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes

From: Eugene Lubarsky
Date: Tue Aug 25 2020 - 06:00:12 EST


On Thu, 20 Aug 2020 10:41:39 -0700
Andrei Vagin <avagin@xxxxxxxxx> wrote:
> Unfotunatly, I don't have enough time to lead a process of pushing
> task_diag into the upstream. So if it is interesting for you, you can
> restart this process and I am ready to help as much as time will
> permit.
>
> I think the main blocking issue was a lack of interest from the wide
> audience to this. The slow proc is the problem just for a few users,
> but task_diag is a big subsystem that repeats functionality of another
> subsystem with all derived problems like code duplication.

Unfortunately I don't have much time either and yes it sounds like
upstreaming a new interface like this will require input & enthusiasm
from more of those who are monitoring large numbers of processes,
which is not really me..

A related issue is that task_diag doesn't currently support the cgroup
filesystem which has the same issues as /proc and is accessed very
heavily by e.g. the Kubernetes kubelet cadvisor. Perhaps more interest
in tackling this could come from the Kubernetes community.

>
> Another blocking issue is a new interface. There was no consensus on
> this. Initially, I suggested to use netlink sockets, but developers
> from non-network subsystem objected on this, so the transaction file
> interface was introduced. The main idea similar to netlink sockets is
> that we write a request and read a response.
>
> There were some security concerns but I think I fixed them.

There's currently a lot of momentum behind io_uring which could not only
enable efficient enumeration and retrieval of small files but maybe it
would also be a more natural place for an API like task_diag..



Best Wishes,
Eugene