Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace

From: Stéphane Graber
Date: Tue Apr 29 2014 - 20:01:14 EST


On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@xxxxxxxxxx> wrote:
> > On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@xxxxxx> wrote:
> >> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
> >> >>
> >> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> >> >>>
> >> >>> Quoting Marian Marinov (mm-108MBtLGafw@xxxxxxxxxxxxxxxx):
> >> >>>>
> >> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >> >>>>>
> >> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@xxxxxxxxxxxxxxxx):
> >> >>>>>>
> >> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >> >>>>>>>
> >> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@xxxxxxxxxxxxxxxx):
> >> >>>>>>>>
> >> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> I'm proposing a fix to this, by replacing the
> >> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
> >> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
> >> >>>>>>>>> CAP_LINUX_IMMUTABLE).
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
> >> >>>>>>>>
> >> >>>>>>>> /**
> >> >>>>>>>> * capable - Determine if the current task has a superior
> >> >>>>>>>> capability in effect
> >> >>>>>>>> * @cap: The capability to be tested for
> >> >>>>>>>> *
> >> >>>>>>>> * Return true if the current task has the given superior
> >> >>>>>>>> capability currently
> >> >>>>>>>> * available for use, false if not.
> >> >>>>>>>> *
> >> >>>>>>>> * This sets PF_SUPERPRIV on the task if the capability is
> >> >>>>>>>> available on the
> >> >>>>>>>> * assumption that it's about to be used.
> >> >>>>>>>> */
> >> >>>>>>>> bool capable(int cap)
> >> >>>>>>>> {
> >> >>>>>>>> return ns_capable(&init_user_ns, cap);
> >> >>>>>>>> }
> >> >>>>>>>> EXPORT_SYMBOL(capable);
> >> >>>>>>>>
> >> >>>>>>>> The documentation states that it is for "the current task", and I
> >> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
> >> >>>>>>>> where
> >> >>>>>>>> using init_user_ns would ever make sense.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
> >> >>>>>>> subject.
> >> >>>>>>>
> >> >>>>>>> The patch by Marian is wrong. Anyone can do 'clone(CLONE_NEWUSER)',
> >> >>>>>>> setuid(0), execve, and end up satisfying
> >> >>>>>>> 'ns_capable(current_cred()->userns,
> >> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
> >> >>>>>>>
> >> >>>>>>> So NACK to that particular patch. I'm not sure, but IIUC it should
> >> >>>>>>> be
> >> >>>>>>> safe to check against the userns owning the inode?
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> So what you are proposing is to replace
> >> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >> >>>>>>
> >> >>>>>> I agree that this is more sane.
> >> >>>>>
> >> >>>>>
> >> >>>>> Right, and I think the two operations you're looking at seem sane
> >> >>>>> to allow.
> >> >>>>
> >> >>>>
> >> >>>> If you are ok with this patch, I will fix all file systems and send
> >> >>>> patches.
> >> >>>
> >> >>>
> >> >>> Sounds good, thanks.
> >> >>>
> >> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@xxxxxxxxxxxxxxxx>
> >> >>>
> >> >>>
> >> >>> Acked-by: Serge E. Hallyn
> >> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@xxxxxxxxxxxxxxxx>
> >> >>
> >> >>
> >> >> Wait, what?
> >> >>
> >> >> Inodes aren't owned by user namespaces; they're owned by users. And any
> >> >> user can arrange to have a user namespace in which they pass an
> >> >> inode_capable check on any inode that they own.
> >> >>
> >> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed. If this
> >> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> >> >> entirely.
> >> >
> >> >
> >> > The problem I'm trying to solve is this:
> >> >
> >> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
> >> > to use chattr on all files witch this container has access to.
> >> >
> >> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
> >> >
> >> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
> >> > container.
> >> >
> >> > The first solution got its user namespace from the currently running process
> >> > and the second gets its user namespace from the currently opened inode.
> >> >
> >> > So what would be the best solution in this case?
> >>
> >> I'd suggest adding a mount option like fs_owner_uid that names a uid
> >> that owns, in the sense of having unlimited access to, a filesystem.
> >> Then anyone with caps on a namespace owned by that uid could do
> >> whatever.
> >>
> >> Eric?
> >>
> >> --Andy
> >
> > The most obvious problem I can think of with "do whatever" is that this
> > will likely include mknod of char and block devices which you can then
> > chown/chmod as you wish and use to access any devices on the system from
> > an unprivileged container.
> > This can however be mitigated by using the devices cgroup controller.
>
> Or 'nodev'. setuid/setgid may have the same problem, too.
>
> Implementing something like this would also make CAP_DAC_READ_SEARCH
> and CAP_DAC_OVERRIDE work.
>
> Arguably it should be impossible to mount such a thing in the first
> place without global privilege.
>
> >
> > You also probably wouldn't want any unprivileged user from the host to
> > find a way to access that mounted filesytem but so long as you do the
> > mount in a separate mountns and don't share uids between the host and
> > the container, that should be fine too.
>
> This part should be a nonissue -- an unprivileged user who has the
> right uid owns the namespace anyway, so this is the least of your
> worries.
>
> --Andy

It should be a nonissue so long as we make sure that a file owned by a
uid outside the scope of the container may not be changed even though
fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
a shell and anyone who can see the fs from the host will be getting a
root shell (assuming said file is owned by the host's uid 0).

So that's restricting slightly what "do whatever" would do in this case.

--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com

Attachment: signature.asc
Description: Digital signature