Re: [PATCH] pidns: Make pid_max per namespace

From: Pavel Emelyanov
Date: Thu Mar 10 2011 - 05:07:54 EST


On 03/10/2011 12:50 PM, Andrew Morton wrote:
> On Thu, 10 Mar 2011 12:35:32 +0300 Pavel Emelyanov <xemul@xxxxxxxxxxxxx> wrote:
>
>> On 03/08/2011 02:58 AM, Andrew Morton wrote:
>>> On Thu, 03 Mar 2011 11:39:17 +0300
>>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> wrote:
>>>
>>>> Rationale:
>>>>
>>>> On x86_64 with big ram people running containers set pid_max on host to
>>>> large values to be able to launch more containers. At the same time
>>>> containers running 32-bit software experience problems with large pids - ps
>>>> calls readdir/stat on proc entries and inode's i_ino happen to be too big
>>>> for the 32-bit API.
>>>>
>>>> Thus, the ability to limit the pid value inside container is required.
>>>>
>>>
>>> This is a behavioural change, isn't it? In current kernels a write to
>>> /proc/sys/kernel/pid_max will change the max pid on all processes.
>>> After this change, that write will only affect processes in the current
>>> namespace. Anyone who was depending on the old behaviour might run
>>> into problems?
>>
>> Hardly. If the behavior of some two apps depends on its synchronous change,
>> these two might want to run in the same pid namespace.
>
> I don't understand your answer. What is this "synchronous change" of which
> you speak? Does your "might want to run" suggestion mean that userspace
> changes would be required for this operation to again work correctly?

Your concern was about "anyone who was depending on the old behaviour", where
the old behavior meant "a write to sys.pid_max will change the max pid on all
processes".

I wanted to say, that if someone changes pid_max and expects someone else to
act differently after this, then these two should live in the same pid namespace.

IOW, if X raises the pid_max, then all the processes X sees in its pid namespace
*may* have pids up to this value. All the other process, that are not visible
in X's pid space will have other values, but X doesn't see them, so why should
we care?

> "In current kernels a write to /proc/sys/kernel/pid_max will change the
> max pid on all processes." Is this incorrect?

Not 100%. If I have some process with pid N and then I change the pid_max to N/2,
that process will still have its pid N which is obviously greater, than N/2.

> "After this change (ie: this patch), that write will only affect
> processes in the current namespace.". Is this incorrect?

With the exception stated above - yes, but I don't understand your concern after
these two questions :(

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/