Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace

From: Andy Lutomirski
Date: Tue Apr 29 2014 - 21:03:56 EST


On Tue, Apr 29, 2014 at 5:44 PM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote:
> Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
>> On Tue, Apr 29, 2014 at 5:21 PM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote:
>> > Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
>> >> > It should be a nonissue so long as we make sure that a file owned by a
>> >> > uid outside the scope of the container may not be changed even though
>> >> > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
>> >> > a shell and anyone who can see the fs from the host will be getting a
>> >> > root shell (assuming said file is owned by the host's uid 0).
>> >>
>> >> I feel like that's too fragile. I'd rather add a rule that one of
>> >
>> > yeah I don't wnat to rush something like that. I'd rather stash
>> > the userns of the task which did the mounting and check against
>> > that. Note that would make it worthless unless and until we allowed
>> > mounting from non-init userns, but then we can only claim "our fs
>> > superblock readers suck and therefore containers can't mount an fs"
>> > so long before we start to feel some shame and audit them...
>> >
>> >> these filesystems always acts like it's nosuid unless you're inside a
>> >> user namespace that matches fs_owner_uid.
>> >>
>> >> Maybe even that is too weird. How about setuid, setgid, and fcaps
>> >> only work on mounts that are in mount namespaces that are owned by the
>> >> current user namespace or one of its parents? IOW, a struct mount is
>> >> only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
>> >> parents?
>> >>
>> >> Untrusted mounts would act like they are nosuid,nodev. Someone can
>> >> try to figure out a safe way to relax nodev at some point.
>>
>> Do you like this variant? We could add a way for global root to mount
>> an fs on behalf of a userns. I'd rather this be more explicit than
>> just mounting it in a mount ns owned by the user namespace, though.
>
> I'm missing something. Which mnt are you talking about? A user
> can just clone a new userns and then clone(CLONE_NEWNS) to get a set
> of mounts owned by himself... We need to get a mnt (or a cred or
> straight to a userns) tied to the first mount of the superblock, istm.

Sure, but then that user is the only user that ends up trusting the
mount. This could end up being surprising, though -- it would be
weird for a bind mount of an implicitly nosuid mount to end up not
being nosuid as seen by the mounter.

This still feels a bit overcomplicated. Grr. I do like that idea
that, if someone creates a tmpfs mount, sticks a setuid file in it,
and hands someone outside the namespace an fd to the mount, that the
file won't be setuid as seen from outside. This will make using the
same uids in different containers a lot safer, although it still won't
really be safe.

Another wart: chroot on a directory in someone else's mount namespace
works, I think. That just seems wrong, although I don't immediately
see how it's a problem.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/