Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace

From: Eric W. Biederman
Date: Mon Aug 06 2012 - 15:16:33 EST


"Serge E. Hallyn" <serge@xxxxxxxxxx> writes:

> Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
>> From: "Daniel P. Berrange" <berrange@xxxxxxxxxx>
>>
>> The following commit
>>
>> commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>> Author: Daniel Lezcano <daniel.lezcano@xxxxxxx>
>> Date: Wed Mar 28 14:42:51 2012 -0700
>>
>> pidns: add reboot_pid_ns() to handle the reboot syscall
>>
>> introduced custom handling of the reboot() syscall when invoked
>> from a non-initial PID namespace. The intent was that a process
>> in a container can be allowed to keep CAP_SYS_BOOT and execute
>> reboot() to shutdown/reboot just their private container, rather
>> than the host.
>>
>> Unfortunately the kexec_load() syscall also relies on the
>> CAP_SYS_BOOT capability. So by allowing a container to keep
>> this capability to safely invoke reboot(), they mistakenly
>> also gain the ability to use kexec_load(). The solution is
>> to make kexec_load() return -EPERM if invoked from a PID
>> namespace that is not the initial namespace
>>
>> Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx>
>> Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
>
> Acked-by: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
>
> (Please see my previous email explaining why I believe the pidns
> is an appropriate check)

Serge as to your objects.

If we define kexec_load in terms of the pid namespace then something
makes sense, but the error should be EINVAL, or something of that sort.

That is what we did with reboot. We defined reboot in terms of the pid
namespace.

Not defining kexec_load in terms of the pid namespace and then returning
EPERM because having we happen to have CAP_SYS_BOOT for other reasons is
semantically horrible.

At the end of the day the effect is the same, but I think it matters a
great deal in how we think about things.

We have CAP_SYS_BOOT in the initial user namespace. We do have
permission to make the system call.

So I continue to see this patch the way it is current constructed as
broken.

Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/