Re: pivot_root(".", ".") and the fchdir() dance

From: Eric W. Biederman
Date: Sun Sep 15 2019 - 14:32:02 EST


"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Hello Eric,
>
> On 9/11/19 1:06 AM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>
>>> Hello Christian,
>>>
>>>>> All: I plan to add the following text to the manual page:
>>>>>
>>>>> new_root and put_old may be the same directory. In particular,
>>>>> the following sequence allows a pivot-root operation without needâ
>>>>> ing to create and remove a temporary directory:
>>>>>
>>>>> chdir(new_root);
>>>>> pivot_root(".", ".");
>>>>> umount2(".", MNT_DETACH);
>>>>
>>>> Hm, should we mention that MS_PRIVATE or MS_SLAVE is usually needed
>>>> before the umount2()? Especially for the container case... I think we
>>>> discussed this briefly yesterday in person.
>>> Thanks for noticing. That detail (more precisely: not MS_SHARED) is
>>> already covered in the numerous other changes that I have pending
>>> for this page:
>>>
>>> The following restrictions apply:
>>> ...
>>> - The propagation type of new_root and its parent mount must not
>>> be MS_SHARED; similarly, if put_old is an existing mount point,
>>> its propagation type must not be MS_SHARED.
>>
>> Ugh. That is close but not quite correct.
>>
>> A better explanation:
>>
>> The pivot_root system call will never propagate any changes it makes.
>> The pivot_root system call ensures this is safe by verifying that
>> none of put_old, the parent of new_root, and parent of the root directory
>> have a propagation type of MS_SHARED.
>
> Thanks for that. However, another question. You text has two changes.
> First, I understand why you reword the discussion to indicate the
> _purpose_ of the rules. However, you also, AFAICS, list a different set of
> of directories that can't be MS_SHARED:
>
> I said: new_root, the parent of new_root, and put_old
> You said: the parent of new_root, and put_old, and parent of the
> root directory.


> Was I wrong on this detail also?

That is how I read the code. The code says:

if (IS_MNT_SHARED(old_mnt) ||
IS_MNT_SHARED(new_mnt->mnt_parent) ||
IS_MNT_SHARED(root_mnt->mnt_parent))
goto out4;

We both agree on put_old and the parent of new_mnt.

When I look at the code root_mnt comes from the root directory, not new_mnt.

Furthermore those checks fundamentally makes sense as the root directory
and new_root that are moving. The directory put_old simply has
something moving onto it.

>> The concern from our conversation at the container mini-summit was that
>> there is a pathology if in your initial mount namespace all of the
>> mounts are marked MS_SHARED like systemd does (and is almost necessary
>> if you are going to use mount propagation), that if new_root itself
>> is MS_SHARED then unmounting the old_root could propagate.
>>
>> So I believe the desired sequence is:
>>
>>>>> chdir(new_root);
>> +++ mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>>> pivot_root(".", ".");
>>>>> umount2(".", MNT_DETACH);
>>
>> The change to new new_root could be either MS_SLAVE or MS_PRIVATE. So
>> long as it is not MS_SHARED the mount won't propagate back to the
>> parent mount namespace.
>
> Thanks. I made that change.

For what it is worth. The sequence above without the change in mount
attributes will fail if it is necessary to change the mount attributes
as "." is both put_old as well as new_root.

When I initially suggested the change I saw "." was new_root and forgot
"." was also put_old. So I thought there was a silent danger without
that sequence.

Eric