Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

From: Andy Lutomirski
Date: Sat May 30 2020 - 18:12:04 EST




> On May 29, 2020, at 11:00 PM, Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> wrote:
>
> ïModern Windows applications are executing system call instructions
> directly from the application's code without going through the WinAPI.
> This breaks Wine emulation, because it doesn't have a chance to
> intercept and emulate these syscalls before they are submitted to Linux.
>
> In addition, we cannot simply trap every system call of the application
> to userspace using PTRACE_SYSEMU, because performance would suffer,
> since our main use case is to run Windows games over Linux. Therefore,
> we need some in-kernel filtering to decide whether the syscall was
> issued by the wine code or by the windows application.

Do you really need in-kernel filtering? What if you could have efficient userspace filtering instead? That is, set something up so that all syscalls, except those from a special address, are translated to CALL thunk where the thunk is configured per task. Then the thunk can do whatever emulation is needed.

Getting the details and especially the interaction with any seccomp filters that may be installed right could be tricky, but the performance should be decent, at least on non-PTI systems.

(If we go this route, I suspect that the correct interaction with seccomp is that this type of redirection takes precedence over seccomp and seccomp filters are not invoked for redirected syscalls. After all, a redirected syscall is, functionally, not a syscall at all.)

>