Re: seccomp: dump core when using SECCOMP_RET_KILL

From: Andrei Vagin
Date: Tue Jan 31 2017 - 16:48:43 EST


On Fri, Jan 27, 2017 at 01:48:30PM -0800, Kees Cook wrote:
> On Wed, Jan 25, 2017 at 12:05 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > On Tue, Jan 24, 2017 at 4:53 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
> >> One of CRIU tests fails with this patch:
> >> https://github.com/xemul/criu/blob/master/test/zdtm/static/seccomp_filter_tsync.c
> >>
> >> Before this patch only a thread which called a "wrong" syscall is killed.
> >> Now a whole process is killed if one of threads called a "wrong" syscall.
> >
> > Oh ew. I wonder what is causing this? In other do_coredump() callers,
> > they explicitly call do_group_exit(). Hmmm
>
> We need to find a way to fix this or remove the coredump change from
> -next. We have a few things that have come up recently (coincident
> with the coredump change):
>
> - some folks would like seccomp kills to kill the entire process not
> just the thread
> - on a full-process kill, there needs to be a way to get a coredump
> - on a kill, it would be nice to have reliable logging
>
> Getting a coredump requires a full-process kill. It is possible to do
> this already with RET_TRAP and just not catch the SIGSYS. However,
> this isn't sufficient if you want to be _sure_ the entire process gets
> killed since RET_TRAP depends on cooperation from the process.
>
> Getting reliable logging out of seccomp for non-RET_KILL is
> non-trivial because syscall-audit doesn't track forks.
>
> The RET_* values are part of the UAPI, so changing or adding to them
> requires care.
>
> Right now we have very little room in the RET_* values (the lower
> bytes are for the RET_DATA which is ignored for RET_KILL and the
> semantics of changing that is very difficult):
>
> #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */
> #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */
>
> Killing the entire process is more aggressive than RET_KILL currently,
> so the question becomes, should we upgrade RET_KILL to
> RET_KILL_PROCESS and add RET_KILL_THREAD? Are there people that WANT
> only a thread to be killed? Andrei, does CRIU depend on this behavior,
> or is it "just" a regression test detail?

It is just a regression test detail. We can change the test if you
decide to kill a whole process.

>
> We can add two more RET_* values... however, it seems like only thread
> vs process is a filter-level concept. Whether or not to dump core
> seems to be an external aspect of the process (think ulimit -c), and
> logging seems to be a system-level thing.
>
> For logging, I think audit needs to grow fork-tracking, and/or have a
> new "is under seccomp" test that can be exposed to auditctl. Then the
> system owner can issue either "tell me about all seccomp kills" or
> "tell me about seccomp kills in this process tree". As such, I don't
> think we should be making filter-level changes to deal with the needs
> of seccomp logging.
>
> For coredumping, I remain a bit stumped. Strictly speaking,
> coredumping exposes more kernel code to the running process, so it is
> "less safe" than the instant kill RET_KILL performs. Though the
> current patch is actually more aggressive in that it causes the entire
> process to die as part of the coredumping.
>
> I think that if there is a move to make RET_KILL kill the process,
> then we can add coredumping. If not, I'm less sure how to proceed...
>
> -Kees
>
> --
> Kees Cook
> Nexus Security