Re: [PATCH v4 2/2] cgroups: add a pids subsystem

From: Aleksa Sarai
Date: Tue Mar 10 2015 - 08:31:56 EST


Hi Austin,

>>>> Does pids limit make sense in the root cgroup?
>>>
>>> I would say it kind of does, although I would just expect it to track
>>> /proc/sys/kernel/pid_max (either as a read-only value, or as an
>>> alternative way to set it).
>>
>> Personally, that seems unintuitive. /proc/sys/kernel/pid_max and the pids
>> cgroup controller are orthogonal features, why should they be able to
>> affect each other (or even be aware of each other)?
>
> I wouldn't consider them entirely orthogonal, the sysctl value is the
> limiting factor for the maximal value that can be set in a given pids
> cgroup. Setting an unlimited value in the cgroup is functionally identical
> to setting it to be equal to /proc/sys/kernel/pid_max, and the root cgroup
> is functionally equivalent to /proc/sys/kernel/pid_max, because all tasks
> that aren't in another cgroup get put in the root.

While it is true that /proc/sys/kernel/pid_max would be functionally equivalent
to setting pids.max to the value of /proc/sys/kernel/pid_max (and thus the pids
root cgroup is functionally equivalent to the parent), it is untrue that the
sysctl value is the limiting factor on what "max" is defined as. "max" is
defined as the maximum possible pid_t value (it's really the only sane maximum
value, because trying to use /proc/sys/kernel/pid_max would be problematic due
to the fact that the maximum limit would keep changing and the line between
"max" and some arbitrary value would be blurred). In addition, the sysctl value
limits the number of pids in the system in a separate part of the kernel -- it
has nothing to do with cgroups and cgroups have nothing to do with it.

> My only thought is that having the file that would set the limit there might
> make things much simpler for software that expects the entire cgroup
> structure to be hierarchical.

The only valid value for pids.max in the root cgroup would be "max". And "max"
is defined as (PID_MAX_LIMIT + 1), not as the current setting of
/proc/sys/kernel/pid_max, because the only *real* maximum value of pid_t is
PID_MAX_LIMIT so the only reasonable way to represent "max" is a number greater
than that.

There is an issue with both of the behaviours you describe. The root-level
pids.max could either:

a) be read-only (which breaks the idea of it being "simpler" because now you
have a special case where you can't write to the limit); or (even worse)
b) modify some other aspect of the kernel in a way that is unique compared to
children of the root hierarchy (which IMO sounds like trouble).

In either of those two cases, the idea of it being "simpler" for software that
makes the (wrong) assumption that you can limit the global maximum number of
pids through the root cgroup is broken because it has either weird side effects
(b) or is just an odd feature (a).

--
Aleksa Sarai (cyphar)
www.cyphar.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/