Re: seccomp and ptrace. what is the correct order?

From: Will Drewry
Date: Tue May 22 2012 - 16:26:24 EST


On Tue, May 22, 2012 at 12:39 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:
>
>> However(!), if we did move secure_computing() to after ptrace _and_
>> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
>> ensure the system call was not changed by the tracer.  This would give
>> strong assurances that whatever system call is executed was explicitly
>> allowed by seccomp policy is the one that was executed.
>
> BTW, after grepping around a bit, I have to say that some callers of those
> hooks make very little sense

Yeah - the arch specific seccomp and ptrace code is in dire need of
attention on a number of platforms.

> Exhibit A: sh32 has in do_syscall_trace_enter(regs)
>        secure_computing(regs->regs[0]);
> Syscall number in r0, right?
>        [usual PTRACE_SYSCALL bits]
>        if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
>                trace_sys_enter(regs, regs->regs[0]);
> Ditto
>        audit_syscall_entry(audit_arch(), regs->regs[3],
>                            regs->regs[4], regs->regs[5],
>                            regs->regs[6], regs->regs[7]);
> Oops - that one says syscall number in r3, first 4 arguments in r4..r7
>        return ret ?: regs->regs[0];
>
> and the caller of that sucker is
> syscall_trace_entry:
>        !                       Yes it is traced.
>        mov     r15, r4
>        mov.l   7f, r11         ! Call do_syscall_trace_enter which notifies
>        jsr     @r11            ! superior (will chomp R[0-7])
>         nop
>        mov.l   r0, @(OFF_R0,r15)       ! Save return value
>        !                       Reload R0-R4 from kernel stack, where the
>        !                       parent may have modified them using
>        !                       ptrace(POKEUSR).  (Note that R0-R2 are
>        !                       used by the system call handler directly
>        !                       from the kernel stack anyway, so don't need
>        !                       to be reloaded here.)  This allows the parent
>        !                       to rewrite system calls and args on the fly.
>        mov.l   @(OFF_R4,r15), r4   ! arg0
>        mov.l   @(OFF_R5,r15), r5
>        mov.l   @(OFF_R6,r15), r6
>        mov.l   @(OFF_R7,r15), r7   ! arg3
>        mov.l   @(OFF_R3,r15), r3   ! syscall_nr
>        !
>        mov.l   2f, r10                 ! Number of syscalls
>        cmp/hs  r10, r3
>        bf      syscall_call
> [...]
> 7:      .long   do_syscall_trace_enter
>
> ... and syscall_call very clearly picks an index in syscall table from r3.
> Incidentally, r0 is the fifth syscall argument...  So what we have is
>        * b0rken hookups for seccomp and tracepoint
>        * b0rken cancelling of syscalls by ptrace (replacing syscall number
> with -1 would've worked; doing that to the 5th argument - not really)
>
> Exhibit B: sh64; makes even less sense, but there the assembler glue had
> been too dense for me to get through.  At the very least, seccomp and
> tracepoint are assuming that syscall number is in r9, while audit is
> assuming that it's in r1.  I'm not too inclined to trust audit in this
> case, though.  The _really_ interesting part is that by the look of
> their pt_regs syscall number is stored separately - not in regs->regs[]
> at all.  And the caller
>        * shoves the return value of do_syscall_trace_enter() into regs->regs[2]
>        * picks syscall number from regs->syscall_nr and uses that as index
> in sys_call_table
>        * seems to imply that arguments are in regs->regs[2..7]
>        * code around the (presumable) path leading there seems to imply
> that syscall number comes from the trap number and isn't in regs->regs[]
> at all.  But I might be misreading that assembler.  Easily.
>
> Exhibit C:
> mips is consistent these days, but it has no tracepoint hookup *and* it does
> open-code tracehook_report_syscall_entry(), except for its return value...
> Used to pass the wrong register to seccomp, IIRC.
>
> We really ought to look into merging those suckers.  It's a source of PITA
> that keeps coming back; what we need is
>        regs_syscall_number(struct pt_regs *)
>        regs_syscall_arg1(struct pt_regs *)
>        ...
>        regs_syscall_arg6(struct pt_regs *)
> in addition to existing
>        regs_return_value(struct pt_regs *)
> added on all platforms and made mandatory for new ones.  With that we
> could go a long way towards merging these implementations...

For fun, I took a stab at that to see how it'd play out for x86.
(Attached instead of inline since it's just a first cut.)

I do think much of the ptrace/seccomp could be merged, but I don't
know how quickly given all the changes that each arch will need.

What makes sense for the current seccomp work? I'm not displeased
with the in-tree behavior, but the alternative approach would change
the ptrace+seccomp interactions (minimally) just by switching the
ordering. It's not hard to change all the existing arches that call
secure_computing calls (I have a tentative patch set), but it wouldn't
fix their general brokenness.

Anyway, I'd like to see seccomp work properly on all arches, but if
there are strong opinions about the correct expectations between
seccomp and ptrace, I'm happy to fix that /now/ for whatever arches it
makes sense for.

thanks!
will
From f281e194e7dee18134266fa4af926d1f1e5ca208 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@xxxxxxxxxxxx>
Date: Tue, 22 May 2012 14:54:16 -0500
Subject: [PATCH] arch/x86,asm-generic: add syscall_regs.h

asm/syscall.h provides a reliable layer of abstraction for accessing
system call-related values across arches. It doesn't come without
costs, and using asm/syscall.h values in place of direct struct pt_regs
access adds non-trivial overhead.

While asm/syscall.h has its place encoding tricky logic, like
CONFIG_COMPAT behavior, there are many places where arch-specific code
could be moved into a shared location if there were arch-agnostic
accessors for the system call register properties.

asm/syscall_regs.h provides inline functions which provide direct
access to the pt_regs members for the cases where asm/syscall.h
doesn't make sense or is too costly.

As an example arch/x86/kernel/ptrace.c has been converted to use the
macros. The resulting ptrace.o file is the exact same size and contains
the exact same output as the pre-patch version.

Suggested-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
---
arch/x86/include/asm/syscall_regs.h | 91 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/ptrace.c | 23 +++++----
include/asm-generic/syscall_regs.h | 76 +++++++++++++++++++++++++++++
3 files changed, 181 insertions(+), 9 deletions(-)
create mode 100644 arch/x86/include/asm/syscall_regs.h
create mode 100644 include/asm-generic/syscall_regs.h

diff --git a/arch/x86/include/asm/syscall_regs.h b/arch/x86/include/asm/syscall_regs.h
new file mode 100644
index 0000000..68aacf4
--- /dev/null
+++ b/arch/x86/include/asm/syscall_regs.h
@@ -0,0 +1,91 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@xxxxxxxxxxxx>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * See asm-generic/syscall_regs.h for descriptions of what we must do here.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+#include <asm/unistd.h>
+
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs)
+{
+ return &regs->orig_ax;
+}
+
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs)
+{
+ /* Note! This does not handle compat mode sign extension, etc */
+ return &regs->ax;
+}
+
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs)
+{
+ return &regs->bx;
+}
+
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs)
+{
+ return &regs->cx;
+}
+
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs)
+{
+ return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs)
+{
+ return &regs->si;
+}
+
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs)
+{
+ return &regs->di;
+}
+
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs)
+{
+ return &regs->bp;
+}
+
+#ifdef CONFIG_X86_64
+static inline unsigned long *regs_syscall64_arg1(struct pt_regs *regs)
+{
+ return &regs->di;
+}
+
+static inline unsigned long *regs_syscall64_arg2(struct pt_regs *regs)
+{
+ return &regs->si;
+}
+
+static inline unsigned long *regs_syscall64_arg3(struct pt_regs *regs)
+{
+ return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall64_arg4(struct pt_regs *regs)
+{
+ return &regs->r10;
+}
+
+static inline unsigned long *regs_syscall64_arg5(struct pt_regs *regs)
+{
+ return &regs->r8;
+}
+
+static inline unsigned long *regs_syscall64_arg6(struct pt_regs *regs)
+{
+ return &regs->r9;
+}
+#endif /* CONFIG_X86_64 */
+
+#endif /* _ASM_X86_SYSCALL_REGS_H */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13474d0..a02ac2a 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -34,6 +34,7 @@
#include <asm/proto.h>
#include <asm/hw_breakpoint.h>
#include <asm/traps.h>
+#include <asm/syscall_regs.h>

#include "tls.h"

@@ -1487,29 +1488,33 @@ long syscall_trace_enter(struct pt_regs *regs)
ret = -1L;

/* do the secure computing after userspace can't change the syscall. */
- if (!ret && secure_computing(regs->orig_ax)) {
+ if (!ret && secure_computing(*regs_syscall_nr(regs))) {
ret = -1L;
goto out;
}

if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
- trace_sys_enter(regs, regs->orig_ax);
+ trace_sys_enter(regs, *regs_syscall_nr(regs));

if (IS_IA32)
audit_syscall_entry(AUDIT_ARCH_I386,
- regs->orig_ax,
- regs->bx, regs->cx,
- regs->dx, regs->si);
+ *regs_syscall_nr(regs),
+ *regs_syscall32_arg1(regs),
+ *regs_syscall32_arg2(regs),
+ *regs_syscall32_arg3(regs),
+ *regs_syscall32_arg4(regs));
#ifdef CONFIG_X86_64
else
audit_syscall_entry(AUDIT_ARCH_X86_64,
- regs->orig_ax,
- regs->di, regs->si,
- regs->dx, regs->r10);
+ *regs_syscall_nr(regs),
+ *regs_syscall64_arg1(regs),
+ *regs_syscall64_arg2(regs),
+ *regs_syscall64_arg3(regs),
+ *regs_syscall64_arg4(regs));
#endif

out:
- return ret ?: regs->orig_ax;
+ return ret ?: syscall_get_nr(current, regs);
}

void syscall_trace_leave(struct pt_regs *regs)
diff --git a/include/asm-generic/syscall_regs.h b/include/asm-generic/syscall_regs.h
new file mode 100644
index 0000000..7b3fb5b
--- /dev/null
+++ b/include/asm-generic/syscall_regs.h
@@ -0,0 +1,76 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@xxxxxxxxxxxx>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * This file is a stub providing documentation for what functions
+ * asm-ARCH/syscall_regs.h files need to define. All arch definitions
+ * should be simple inlines.
+ *
+ * All functions are meant to be cross-arch accessors into pt_regs.
+ * This means that no locks are expected and a populated pt_regs is
+ * expected. It also means that any per-arch expectations, like
+ * runtime CONFIG_COMPAT behavior, must be handled by the caller.
+ *
+ * See asm/syscall.h for implementations that handle higher level
+ * runtime logic.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+struct pt_regs;
+
+/**
+ * regs_syscall_nr - returns a pointer to the system call number entry
+ * @regs: task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs);
+
+/**
+ * regs_syscall_return - returns a pointer to the system call return value entry
+ * @regs: task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs);
+
+/**
+ * regs_syscall32_arg1...6 - returns a pointer to the system call arg entry
+ * @regs: task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer. The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+/**
+ * regs_syscall64_arg1...6 - returns a pointer to the system call arg entry
+ * @regs: task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer. The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+#endif /* _ASM_X86_SYSCALL_REGS_H */
--
1.7.9.5