Re: [PATCH v8 6/8] ptrace,seccomp: Add PTRACE_SECCOMP support

From: Will Drewry
Date: Tue Feb 21 2012 - 12:31:11 EST


On Fri, Feb 17, 2012 at 4:55 PM, Indan Zupancic <indan@xxxxxx> wrote:
> Hello,
>
> On Fri, February 17, 2012 17:23, Will Drewry wrote:
>> On Thu, Feb 16, 2012 at 11:08 PM, Indan Zupancic <indan@xxxxxx> wrote:
>>>> +/* Indicates if a tracer is attached. */
>>>> +#define SECCOMP_FLAGS_TRACED 0
>>>
>>> That's not the best way to check if a tracer is attached, and if you did use
>>> it for that, you don't need to toggle it all the time.
>>
>> It's logically no different than task->ptrace.  If it is less
>> desirable, that's fine, but it is functionally equivalent.
>
> Except that when using task->ptrace the ptrace code keeps track of it and
> clears it when the ptracer goes away. And you're toggling SECCOMP_FLAGS_TRACED
> all the time.

Yep, the code is gone in the coming version. It was ugly to need to
change it everywhere TIF_SYSCALL_TRACE was toggled.

>>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>>>> index c75485c..f9d419f 100644
>>>> --- a/kernel/seccomp.c
>>>> +++ b/kernel/seccomp.c
>>>> @@ -289,6 +289,8 @@ void copy_seccomp(struct seccomp *child,
>>>>  {
>>>>       child->mode = prev->mode;
>>>>       child->filter = get_seccomp_filter(prev->filter);
>>>> +     /* Note, this leaves seccomp tracing enabled across fork. */
>>>> +     child->flags = prev->flags;
>>>
>>> What if the child isn't ptraced?
>>
>> Then falling through with TIF_SYSCALL_TRACE will result in the
>> SECCOMP_RET_TRACE events to be allowed, but this comes back to the
>> race.  If I can effectively "check" that ptrace did its job, then I
>> think this becomes a non-issue.
>
> Yes. But it would be still sloppy state tracking, which can lead to
> all kind of unlikely but interesting scenario's. If the child is ever
> attached to later on, that flag will be still set. Same is true for
> any descendant, they all will have that flag copied.

Yup - it'd lead to tracehook fall through and an implicit allow. Not
ideal at all.

>>>>  }
>>>>
>>>>  /**
>>>> @@ -363,6 +365,19 @@ int __secure_computing_int(int this_syscall)
>>>>                       syscall_rollback(current, task_pt_regs(current));
>>>>                       seccomp_send_sigtrap();
>>>>                       return -1;
>>>> +             case SECCOMP_RET_TRACE:
>>>> +                     if (!seccomp_traced(&current->seccomp))
>>>> +                             return -1;
>>>> +                     /*
>>>> +                      * Delegate to TIF_SYSCALL_TRACE. This allows fast-path
>>>> +                      * seccomp calls to delegate to slow-path if needed.
>>>> +                      * Since TIF_SYSCALL_TRACE will be unset on ptrace(2)
>>>> +                      * continuation, there should be no direct side
>>>> +                      * effects.  If TIF_SYSCALL_TRACE is already set, this
>>>> +                      * has no effect.
>>>> +                      */
>>>> +                     set_tsk_thread_flag(current, TIF_SYSCALL_TRACE);
>>>> +                     /* Falls through to allow. */
>>>
>>> This is nice and simple, but not race-free. You want to check if the ptracer
>>> handled the event or not. If the ptracer died before handling this then the
>>> syscall should be denied and the task should be killed.
>>
>> Hrm. I think there's a way to do this without forcing seccomp to
>> always go slow path.  I'll update the patch and see how it goes.
>
> You only have to go through the slow path for the SECCOMP_RET_TRACE case.

You'd have to know at syscall entry time to decide or pre-scan the bpf
filter to see if SECCOMP_RET_TRACE is returned non-programmatically,
then add a TIF flag for slow pathing, but that seems crazy bad.

> But yeah, toggling TIF_SYSCALL_TRACE seems the only way to avoid the slow
> path, sometimes. The downside is that it's unexpected behaviour which may
> clash with arch entry code, so I'm not sure if that's a good idea. I think
> always going through the slow path isn't too bad, compared to the ptrace
> alternative it's still a lot faster.

Since supporting that behavior is documented as a prerequisite for
adding HAVE_ARCH_SECCOMP_FILTER, I don't see how it could be
unexpected behavior. On systems, like x86, where seccomp is always
slowpath, it has no impact. However, it means that if a fast path is
added (like audit), then it will need to know to re-check the
threadinfo flags. I don't want to try to optimize in advance, but
it'd be nice to not close off any options for later. If an explicit
ptrace_event(SECCOMP) call was being made, then we'd be stuck in the
slow path or stuck making the ptrace code have more ifs for
determining if the source was a normal ptrace event or special
seccomp-triggered one. That might be okay as a long-term-thing,
though, since the other option (which the incoming patchset does) is
to add a post-trace callback into seccomp. I'm not sure which is
truly preferable.

>>> Many people would like a PTRACE_O_KILL_TRACEE_IF_DEBUGGER_DIES option,
>>> Oleg was working on that, among other things. Perhaps re-use that to
>>> handle this case too?
>>
>> Well, if you can inject initial code into the tracee, then it can call
>> prctl(PR_SET_PDEATHSIG, SIGKILL).  Then when the tracer dies, the
>> child dies.
>
> That only works for child tracees, not descendants of the tracee.

True enough.

>> If the SIGKILL race in arch_ptrace_... is resolved, then
>> a SIGKILL that arrives between seccomp and delegation to ptrace should
>> result in process death.  Though perhaps my proposal above will make
>> seccomp's integration with ptrace less subject to ptrace behaviors.
>
> Oleg fixed the SIGKILL problem (it wasn't a race), it should go upstream
> in the next kernel version, I think.

Pick your own name for it then, I guess. The signal lock was held in
ptrace_notify. Then, in order to hand off to the arch_ptrace_notify
code, it releases the lock, then claims it again after. If SIGKILL
was delivered in that time window, then the post-arch-handoff code
would see it, skip notification of the tracer, and allow the syscall
to run prior to terminating the task. I'm excited to see it fixed :)

>>>>               case SECCOMP_RET_ALLOW:
>>>
>>> For this and the ERRNO case you could check that PTRACE_O_SECCOMP option and
>>> decide to do something or not in ptrace.
>>
>> For ERRNO, I'd prefer not to since it adds implicit behavior to the
>> rules and, without pulling a ptrace_event()ish call into this code, it
>> would change the return flow and potentially open up errno, which
>> should be solid, to races, etc.  For ALLOW, sure, but at that point,
>> just use PTRACE_SYSCALL.  Perhaps this can all be ameliorated if I can
>> get a useful ptrace_entry completed notification.
>
> You don't want ptrace to be able to override the decision? Fair enough.
> Or did you mean something else?

Exactly. The first time I went down this path, I let a tracer pick up
any denied syscalls, but that complicated the interactions and
security model quite a bit. I also don't want to add an implicit
dependency on the syscall slow-path for any other return values --
just in case the proposed TIF_SYSCALL_TRACE approach isn't acceptable.

thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/