Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases

From: Andy Lutomirski
Date: Fri Feb 06 2015 - 15:20:48 EST


On Fri, Feb 6, 2015 at 12:16 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Fri, Feb 6, 2015 at 12:12 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Fri, Feb 6, 2015 at 12:07 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>> On Fri, Feb 6, 2015 at 11:32 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Fri, Feb 6, 2015 at 11:23 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>> And especially since a ptracer
>>>>> can change syscalls during syscall-enter-stop to any syscall it wants,
>>>>> bypassing seccomp. This condition is already documented.
>>>>
>>>> If a ptracer (using PTRACE_SYSCALL) were to get the entry callback
>>>> before seccomp, then this oddity would go away, which might be a good
>>>> thing. A ptracer could change the syscall, but seccomp would based on
>>>> what the ptracer changed the syscall to.
>>>
>>> I want kill events to trigger immediately. I don't want to leave the
>>> ptrace surface available on a SECCOMP_RET_KILL. So maybe it can be
>>> seccomp phase 1, then ptrace, then seccomp phase 2? And pass more
>>> information between phases to determine how things should behave
>>> beyond just "skip"?
>>
>> I thought so too, originally, but I'm far less convinced now, for two reasons:
>>
>> 1. I think that a lot of filters these days use RET_ERRNO heavily, so
>> this won't benefit them.
>>
>> 2. I'm not convinced it really reduces the attack surface for anyone.
>> Unless your filter is literally "return SECCOMP_RET_KILL", then the
>> seccomp-filtered task can always cause the ptracer to get a pair of
>> syscall notifications. Also, the task can send itself signals (using
>> page faults, breakpoints, etc) and cause ptrace events via other
>> paths.
>
> What are you thinking for a solution?
>

I'm writing a patch now. It's an ABI break, but this thread seems to
show that the ABI was somewhat useless before the split-phase changes,
and it's differently broken now, so I would be surprised if the change
broke anything that was currently working. I'll send it later today,
hopefully.

> As for capping SECCOMP_RET_ERRNO to MAX_ERRNO, how about this (sorry
> if gmail butchers the paste):
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 4ef9687ac115..c88148d20bd5 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -629,7 +629,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct
>
> switch (action) {
> case SECCOMP_RET_ERRNO:
> - /* Set the low-order 16-bits as a errno. */
> + /* Set the low-order bits as a errno. */
> + if (data > MAX_ERRNO)
> + data = MAX_ERRNO;
> syscall_set_return_value(current, task_pt_regs(current),
> -data, 0);
> goto skip;
>

I'm fine with this, but I'm not entirely convinced it solves a
problem. SECCOMP_RET_ERRNO | 5000 didn't work before, and it doesn't
work now. Admittedly, the new failure mode is possibly better.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/