Re: [GIT PULL] user namespace and namespace infrastructure changes for 3.8

From: Eric W. Biederman
Date: Thu Dec 13 2012 - 23:11:35 EST


Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:

> On Thu, Dec 13, 2012 at 2:01 PM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>>
>>> On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>
>> But please also note the difference between capable and ns_capable. Any
>> security check that is capable() still requires priviliges in the
>> initial user namespace.
>
> Huh?
>
> I'm talking about:
>
> clone with CLONE_NEWUSER
> - child does unshare(CLONE_NEWPID)
> - parent does setfd(child's pid namespace)
>
> Now the parent is running in the init userns with a different pid ns.
> Setuid binaries will work but will see the unexpected pid ns. With
> mount namespaces, this would be Really Bad (tm). With pid or ipc or
> net, it's less obviously dangerous, but I'm not convinced it's safe.

That isn't safe. That is a sneaky bug in my tree that I overlooked. :(

> I sort of think that setns on a *non*-userns should require
> CAP_SYS_ADMIN in the current userns, at least if no_new_privs isn't
> set.

Yes. CAP_SYS_ADMIN in your current user namespace should make
setns as safe as it currently is before my patches.

That is just a matter of adding a couple nsown_capable(CAP_SYS_ADMIN)
permission checks.

Right now I test for nsown_capable(CAP_SYS_CHROOT) for the mount
namespace, which is probably sufficient to prevent those kinds of
shenanigans but I am going to add a nsown_capable(CAP_SYS_ADMIN) for
good measure.

> A non-threaded process can have mm_users == 2 with CLONE_VM. I'm not
> sure this is a problem, though.

No it isn't. I said threads because they are the easy concept not
because that covers all possible corner cases.

>>> In any case, why are threads special here?
>>
>> You know I don't think I stopped to think about it. The combination
>> of CLONE_NEWUSER and CLONE_THREAD have been denined since the first user
>> namespace support was merged in 2008.
>>
>> I do know that things can get really strange when you mix multiple
>> namespaces in a process. tkill of your own threads will stop working.
>> Which access permissions should apply to files you mmap, file handles
>> you have open, the core dumper etc.
>>
>> We do allow setresuid per thread so we might be able to cope
>> with a process that mixes with user namespaces in different threads,
>> but I would want a close review of things before we allow that kind of
>> sharing.
>>
>
> Fair enough.
>
> I'd personally be more concerned about shared signal handlers than
> shared tgid, though. The signal handler set has all kinds of weird
> things like session id.

CLONE_THREAD implies CLONE_VM and CLONE_SIGNAL,
and in practice mm_users > 1 protects against all of those cases.

So I was really thinking all of those cases.


>> (See the end. A significant bug in cap_capable slipped in about
>> 3.5. cap_capable is only supposed to grant permissions to the owner
>> of a user namespace if it is a child user namespace).
>
> [snipping lots of stuff]
>
> If the intended semantics of cap_capable are, indeed:
>
> If targ_ns is equals or is a descendent of cred->user_ns and the cap
> is effective, return true. If targ_ns is owned by cred->euid and
> targ_ns is a descendent of cred->user_ns (but is not equal to
> cred->user_ns), then return true. Else return false
>
> then I agree with you on almost everything you said. I assumed that
> the actual semantics were intended.

Good.

>>> unshare has a bug. This code:
>>
>> Interesting...
>>
>> Looking at it this is a very small misfeature.
>>
>> What is happening is that commit_creds is setting is making the task
>> undumpable because we changed the set of capabilities in struct cred.
>>
>> This in turn results in pid_revalidate setting the owner of
>> of /proc/self/uid_map to GLOBAL_ROOT_UID.
>>
>> From the outside the permissions on /proc/self/uid_map look like:
>> -rw-r--r-- 1 root root 0 2012-12-13 12:43 /proc/30530/uid_map
>>
>> Then since /proc/self/uid_map uses the default permission function
>> and the test program below is not run as root the read-write open
>> of uid_map fails.
>
> It's probably either worth fixing this or disabling unshare of userns.
> This makes it hard to use. IMO non-dumpable tasks should still be
> able to access the contents of /proc/self -- i.e. I'd call this a more
> general bug.
>
> But I'd also argue that unsharing userns shouldn't set non-dumpable --
> cap_permitted increased, but the new capabilities are still logically
> a subset of the old ones.

Agreed. Setting dumpable is the bug.

I am going to sleep on it but the code in commit_creds should probably
read:

/* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
!gid_eq(old->egid, new->egid) ||
!uid_eq(old->fsuid, new->fsuid) ||
!gid_eq(old->fsgid, new->fsgid) ||
((old->user_ns == new->user_ns) &&
!cap_issubset(new->cap_permitted, old->cap_permitted))) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
smp_wmb();
}

Which is correct assuming a user_ns change is always to a more nested
user namespace.

> One more issue: the requirement that both upper and lower uids (etc.)
> in the maps are in order is rather limiting. I have no objection if
> you only require upper ids to be monotonic, but currently there's no
> way to may root outside to uid n (for n > 0) and some nonroot user
> outside to uid 0.

There is. You may set up to 5 (extents). You just have to use a second
extent for the non-contiguous bits. My reader is lazy and you have to
set all of the extents with a single write, so you may have missed the
ability to set more than one extent.

> Also, the fact that you can remap onto the unmapped id is a little
> strange. I haven't found any reason it would be a security bug, but
> it's still odd.

Yes the unmapped id is odd. Including the fact it can be set with a
sysctl.

The world has such lovely round edges.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/