Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

From: Serge E. Hallyn
Date: Wed May 15 2013 - 21:22:16 EST


Quoting Serge E. Hallyn (serge@xxxxxxxxxx):
> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
> > Serge Hallyn <serge.hallyn@xxxxxxxxxx> writes:
> >
> > > Quoting Aristeu Rozanski (aris@xxxxxxxxxx):
> > >> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> > >> > so now that the device cgroup properly respects hierarchy, not allowing
> > >> > a cgroup to be given greater permission than its parent, should we consider
> > >> > relaxing the capability checks?
> > >> >
> > >> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> > >> > devcgroup_can_attach() to protect changing another task's cgroup, and
> > >> > one in devcgroup_update_access() to protect writes to the devices.allow
> > >> > and devices.deny files.
> > >> >
> > >> > I think the first should be changed to a check for ns_capable() to
> > >> > the victim's user_ns. Something like
> > >> >
> > >> > --- a/security/device_cgroup.c
> > >> > +++ b/security/device_cgroup.c
> > >> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> > >> > struct cgroup_taskset *set)
> > >> > {
> > >> > struct task_struct *task = cgroup_taskset_first(set);
> > >> > + struct user_namespace *ns;
> > >> > + int ret = -EPERM;
> > >> >
> > >> > - if (current != task && !capable(CAP_SYS_ADMIN))
> > >> > - return -EPERM;
> > >> > - return 0;
> > >> > + if (current == task)
> > >> > + return 0;
> > >> > +
> > >> > + ns = userns_get(task);;
> > >> > + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> > >> > + put_user_ns(ns);
> > >> > + return ret;
> > >> > }
> > >>
> > >> wouldn't this allow a userns root to move a task in the same userns into
> > >> a parent cgroup? I believe than anything but moving down the hierarchy
> > >> would be very complicated to verify (how far up can you go).
> > >
> > > But only if they are able to open the tasks file for writing, which
> > > they shouldn't be able to do, right?
> >
> > That should be looked at very closely. There are some funny exploits of
> > setuid root applications writing to files that have required some
> > additional permission checks on /proc/<pid>/uid_map. I think the
> > cgroups files may be vulnerable to some of the same kind of exploits.
> >
> > Certainly we should be verifying that the opener of the file had the
> > capabilities we are trying to use to avoid being open to those kinds of
> > problems.
> >
> > I am trying to see the utilitity of the proposed patch. It doesn't
> > allow mknod. So what is the benefit of having the user namespace bits?
>
> I'm still thinking through it, which is why I haven't sent a real
> patch. What I'm working on is the unprivileged startup of a container.
> Right now most things are not allowed in a private user ns, so device
> cgroup is not as useful. But it should be possible eventually to use
> block devices, which the original unprivileged user owned, by chowning
> the blockdev to a user mapped into the target userns.
>
> The unprivileged user may want to use devices cgroup so he can chown
> the loop file into the container, but only allow read-only mounts, for
> instance.
>
> > Is the point to allow the userns root to remove access to selected
> > devices from it's children even if the DAC permissions would allow the
> > access?
>
> Yes I think that's it - except userns root before forking the container
> init (and venturing into the really untrusted category).
>
> ...
>
> > That said I haven't looked at open or mknod, and usually we are talking
> > about calls that aren't made by suid apps so I think there is a fair
> > chance that dropping some of those permissions could cause issues.
> > The first danger that crosses my mind is what happens if you remove
> > access to /dev/tty from a normal application that would trying and log
> > strange goings on to a user if they could.
>
> If they were going to do that over tty, that would be to the malicious
> user anyway, so that should just either be ignored, or result in the
> program exiting early.
>
> > Shrug mostly I don't see the advantage of this change.
>
> It's also possible that this will end up being worked around by the new
> (not-yet-designed) interface/library which Tejun wants people to use,
> sitting above the cgroupfs. At least at a first layer.
>
> Anyway this isn't urgent, as it's not in the way for general unprivileged
> container creation. But in general if we don't need the check to be
> capable(), it would be better to introduce the right check.
>
> -serge

I'm terribly sorry, Andrew, I have no idea how that address for you got
into my address book. (Corrected) fwiw the thread can be followed at
https://lkml.org/lkml/2013/5/14/363 .

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/