Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

From: Paul Gofman
Date: Sun May 31 2020 - 14:36:29 EST


On 5/31/20 21:10, Andy Lutomirski wrote:
>
> That's not what I meant. I meant that you would set the kernel up to
> redirect *all* syscalls from the thread with the sole exception of one
> syscall instruction in the thunk. This would catch Windows syscalls
> and Linux syscalls. The thunk would determine whether the original
> syscall was Linux or Windows and handle it accordingly.
>
> This may interact poorly with the DRM scheme. The redzone might need
> to be respected, or stack switching might be needed.

Oh yeah, I see now, thanks. Sure, we could trap every syscall and have a
Seccomp-allowed trampoline for executing native ones with the existing
Seccomp implementation. But this is going to have prohibitive
performance impact. Our present use case specifics is that vast majority
of syscalls do not need to be emulated, they are native. And just a few
go from the Windows application which we need to trap and route to our
handler to let the program continue, while we do not care too much about
the overhead for those few. So the hope was that the kernel can route
that majority of Linux native syscalls inside with the minor overhead.
I've read the suggestion to use SECCOMP_RET_USER_NOTIF instead of
SECCOMP_RET_TRAP, is handling the trap this way supposed to be much
quicker than handling the sigsys from SECCOMP_RET_TRAP? More
specifically, would not SECCOMP_RET_USER_NOTIF effectively serialize all
the syscalls waiting in a single queue for processing, while
SECCOMP_RET_TRAP can be processed without exclusive locking?