Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884breaks the Chromium seccomp sandbox

From: Andrew Lutomirski
Date: Mon Nov 14 2011 - 03:39:12 EST


On Sun, Nov 13, 2011 at 10:50 PM, Mark Seaborn <mseaborn@xxxxxxxxxxxx> wrote:
> On 13 November 2011 18:36, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>
>> On Sun, Nov 13, 2011 at 4:40 PM, Nix <nix@xxxxxxxxxxxxx> wrote:
>> > With this commit installed:
>> >
>> > commit 5cec93c216db77c45f7ce970d46283bcb1933884
>> > Author: Andy Lutomirski <luto@xxxxxxx>
>> > Date:   Sun Jun 5 13:50:24 2011 -0400
>> >
>> >    x86-64: Emulate legacy vsyscalls
>> >
>> > With CONFIG_SECCOMP set, and the Chromium seccomp sandbox compiled in
>> > and enabled (which is not the default), on a system running glibc 2.12.x
>> > (thus, relying on emulated vsyscalls), Chromium renderers sometimes hang
>> > or abruptly abort before rendering anything (both of which show as pages
>> > that never complete rendering and eventually get a Chromium kill request
>> > dialog). The hang is consistent for a given page, but not all pages
>> > hang. (One that *does* hang is the chrome://extensions page, so network
>> > access is not the problem here.)
>> >
>> > vsyscall=native does not help.
>> >
>> > Turning off CONFIG_SECCOMP, or running Chromium with the seccomp sandbox
>> > disabled, fixes it.
>> >
>> > I speculate that do_emulate_vsyscall() is broken, but it's hard to debug
>> > the Chromium renderer sandboxing to see what's failing because the
>> > multiple layers of sandboxing get in the way, as they are designed to :)
>>
>> I don't buy that explanation -- with vsyscall=native,
>> do_emulate_vsyscall shouldn't be called at all.  I have a much simpler
>> explanation: the Chromium sandbox is calling vsyscalls in seccomp
>> mode, which has no business working.
>
> I think the problem is that seccomp-sandbox attempts to patch the
> vsyscall page.  It replaces the SYSCALL instructions in this page with
> jumps to seccomp-sandbox's handler.  (More accurately, seccomp-sandbox
> creates a patched copy of the vsyscall page.  It redirects glibc's
> indirect jumps so that they go to the patched copy of the vsyscall
> page instead of to the original.)  The code for this is in
> patchVSystemCalls() in library.cc
> (http://code.google.com/p/seccompsandbox/source/browse/trunk/library.cc).
>
> If the vsyscall page's code no longer invokes the kernel via SYSCALL
> instructions but via some other trap, seccomp-sandbox's trick will no
> longer work, because it doesn't know to patch the instructions that do
> this new trap.

The vsyscall code is now:

mov $__NR_whatever %rax
syscall
ret

It used to be weirder, but we changed to to avoid breaking things like
this. The secret is that, if vsyscall=emulate, the vsyscall page is
not executable and we use the page fault to invoke
do_emulate_vsyscall. But userspace can't tell it's not executable
without actually jumping there, and with vsyscall=native, it's just a
normal syscall.

I'll try to build a sandboxing copy of chromium tomorrow to see if I
can reproduce it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/