Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when procand sysfs can be mounted

From: Gao feng
Date: Fri Nov 15 2013 - 01:14:21 EST


On 11/15/2013 12:54 PM, Eric W. Biederman wrote:
> Gao feng <gaofeng@xxxxxxxxxxxxxx> writes:
>
>> On 11/15/2013 12:54 AM, Andy Lutomirski wrote:
>>> On Thu, Nov 14, 2013 at 3:10 AM, Gao feng <gaofeng@xxxxxxxxxxxxxx> wrote:
>>>> On 11/13/2013 03:26 PM, Gao feng wrote:
>>>>> On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
>>>>>> Right now I would rather not have the empty directory exception than
>>>>>> remove this code.
>>>>>>
>>>>>> The test is a little trickier to write than it might otherwise be
>>>>>> because /proc and /sys tend to be slightly imperfect filesystems.
>>>>>>
>>>>>> I think the only way to really test that is to call readdir on the
>>>>>> directory itself :( I don't like that thought.
>>>>>>
>>>>>> I don't know what I was thinking when I wrote that test but I definitely
>>>>>> goofed up. Grr!
>>>>>>
>>>>>> I can certainly filter out any directory with nlink > 2. That would be
>>>>>> an easy partial step forward.
>>>>>>
>>>>>> The real question though is how do I detect directories it is safe to
>>>>>> mount on where there will not be files in them. I can't call iterate
>>>>>> with the namespace_lock held so things are a bit tricky.
>>>>>>
>>>>>
>>>>> I know this problem is not easy to be resolved. why not let the user
>>>>> make the decision? maybe we can introduce a new mount option MS_LOCK,
>>>>> if user wants to use mount to hide something, he should use mount with
>>>>> option MS_LOCK. so the unpriviged user can't umount this filesystem and
>>>>> fail to mount the filesystem if one of it's child mount is mounted with
>>>>> MS_LOCK option otherwise he use MS_REC too.
>>>>>
>>>>
>>>> Something like this.
>>>>
>>>> From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001
>>>> From: Gao feng <gaofeng@xxxxxxxxxxxxxx>
>>>> Date: Wed, 13 Nov 2013 16:06:46 +0800
>>>> Subject: [PATCH] userns: introduce new mount option MS_LOCK
>>>>
>>>> After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
>>>> vfs: Lock in place mounts from more privileged users,
>>>> in userns, the mounts of child mntns which copied from
>>>> parent mntns is locked and user has no rights to umount/move
>>>> them, it's too strict.
>>>>
>>>> The core purpose of above commit is trying to prevent
>>>> unprivileged user from accessing files hidden by mount.
>>>> This patch introduces a new mount option MS_LOCK, this
>>>> gives user the capable to mount filesystem as the type
>>>> of lock if he wants to use mount to hide something.
>>>>
>>>
>>> This is bad -- if something was secure in old kernels, it needs to
>>> stay secure. If you had MS_NOT_A_LOCK, that would be okay, but it
>>> might not solve your problem.
>>>
>>
>> what you mean old kernels here? I saw patch "vfs: Lock in place mounts from more privileged users"
>> is merged into upstream in linux 3.12-rc1, this is not very old. I think there
>> are not many userspace processes rely on this feature.
>
> Sort of true. Most people aren't that silly. This feature was added to
> defend against a theoretical attack that you can use with mount
> namespaces.
>
> In particular the scenario we are concerned with is:
>
> Suppose the file system looks like:
>
> Suppose there are two filesystems a and b that look like:
>
> a:/usr/
> a:/usr/my_very_secret_file
> a:/dev/
> a:/etc/
> a:/lib/
>
> b:/bin/
> b:/etc/
> b:/games/
> b:/include/
> b:/lib/
> b:/lib32/
> b:/local/
> b:/sbin/
> b:/share/
> b:/src/
>
> And filesystem b is mounted on a:/usr hiding a:/usr/my_very_secret_file
>
> So the filesystem looks like:
>
> /usr/
> /usr/bin/
> /usr/etc/
> /usr/games/
> /usr/include/
> /usr/lib/
> /usr/lib32/
> /usr/local/
> /usr/sbin/
> /usr/share/
> /usr/src/
> /dev/
> /etc/
> /lib/
>
> Without locking mounts into place an unprivileged user can clone the
> mount namespace and do "umount /usr" and read /usr/my_very_secret_file.
>
> Most systems don't hide sensitive things with mounts but it is very
> possible and guarding against is fairly cheap and easy. And while a
> little annoying it should not be a large impediment to unprivileged user
> of the user namespace because pivot root still works.
>
> This thread started talking about bugs in fs_fully_visible. And those
> bugs are fixable and I aim to get to them shortly. At the very least
> I can lie and test for nlink <= 2 which fixes the regression in mounting
> proc.
>
> Then I can write the fun version that takes references and drops locks
> so it can call the internal version of readdir to see if a directory is
> actually empty.
>
> But the principle remains the same we really don't want to reveal
> anything that is hidden under a mount on purpose or by mistake. Just
> because then we don't have to think about those things from a security
> point of view making everyone's life easier.
>

Ok,I agree with you that we should make container security by default.

What's your idea that introduces option MS_NOT_A_LOCK just like Andy's
advisement?

In libvirt, host creates dev and devpts directories for container,then
mount devpts, tmpfs on them and create device nodes inside these dirs
for container. and then in container, these filesystems are moved to
container's /dev/ /dev/pts directory. We really have no need to lock
these mounts. they are just created for container.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/