Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

From: Robert O'Callahan
Date: Thu Jun 25 2020 - 19:15:13 EST


rr (https://rr-project.org, https://arxiv.org/abs/1705.05937) grapples
with a similar problem. We need to intercept commonly-executed system
calls and wrap them with our own processing, with minimal overhead. I
think our basic approach might work for Wine without kernel changes.

We use SECCOMP_SET_MODE_FILTER with a simple filter that returns
SECCOMP_RET_TRAP on all syscalls except for those called from a single
specific trampoline page (which get SECCOMP_RET_ALLOW). rr ptraces its
children. So, when user-space makes a syscall, the seccomp filter
triggers a ptrace trap. The ptracer looks at the code around the
syscall and if it matches certain common patterns, the ptracer patches
the code with a jump to a stub that does extra work and issues a real
syscall via the trampoline. Thus, each library syscall instruction is
slow the first time and fast every subsequent time. "Weird" syscalls
that the ptracer chooses not to patch do incur the context-switch
penalty every time so their overhead does increase a lot ... but it
sounds like that might be OK in Wine's case?

A more efficient variant of this approach which would work in some
cases (but maybe not Wine?) would be to avoid using a ptracer and give
the process a SIGSYS handler which does the patching.

Rob
--
Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy
ot mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil
eht. Efil fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah
ruo dna ta dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw,
draeh evah ew hcihw, gninnigeb eht morf saw hcihw taht.