Re: counting file descriptors with a cgroup controller

From: Krzysztof Opasiak
Date: Tue Mar 07 2017 - 07:35:06 EST


Hi

On 03/06/2017 07:58 PM, Tejun Heo wrote:
Hello,

On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
We need to limit and monitor the number of file descriptors processes
keep open. If a process exceeds certain limit we'd like to terminate it
and restart it or reboot the whole system. Currently the RLIMIT API
allows limiting the number of file descriptors but to achieve our goals
we'd need to make sure all programmes we run handle EMFILE errno
properly. That is why we consider developing a cgroup controller that
limits the number of open file descriptors of its members (similar to
memory controler).

Any comments? Is there any alternative that:

+ does not require modifications of user-land code,
+ enables other process (e.g. init) to be notified and apply policy.

Hmm... I'm not quite sure fds qualify as an independent system-wide
resource. We did that for pids because pids are globally limited and
can run out way earlier than memory backing it. I don't think we have
similar restructions for fds, do we?

Well I'm not aware of such restrictions...

So maybe let me clarify our use case so we can have some more discussion about this. We are dealing with task of monitoring system services on an IoT system. So this system needs to run as long as possible without reboot just like server. In server world almost whole system state is being monitored by services like nagios. They measure each parameter (like cpu, memory etc) with some interval. Unfortunately we cannot use this it in an embedded system due to power consumption.

So generally now we consider two approaches:

1) Use rlimits when possible to limit resources for each process.

The problem here is that this creates an implicit requirement that all system services are well written and able to detect that they for example run out of fd and they will just exit with a suitable error code instead of hanging forever and responding to clients that they are unable to handle their request due to lack of fd. This is hard specially when service use a lot of libraries under the hood because they also need to return this error code from each functions which opens some files. This is especially hard when using some proprietary services or libraries for we don't have access to source code.

2) Use cgroups to limit and monitor resources usage

Generally systemd creates a cgroup for each service. cgroups like memory cgroup has an ability to notify userspace when memory usage reaches some level. So for example systemd could get notification that one of cgroups is using more memory than it should but as long as it's not a hard limit of the cgroup this service is not going to even notice this. So instead of returning error from for example malloc() in service, systemd could just send signal to that service and ask it to exit gracefully and the restart it. The disadvantage of this solution is the need of having cgroup for each resource we would like to monitor. For now we have suitable cgroups for everything we need apart from file descriptors.

What do you think about this? Maybe you have some other ideas how we could achieve this?

Best regards,
--
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics