Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:[RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

From: Andrew Lutomirski
Date: Tue Jan 17 2012 - 20:02:07 EST


On Tue, Jan 17, 2012 at 4:56 PM, Indan Zupancic <indan@xxxxxx> wrote:
> On Tue, January 17, 2012 18:45, Andrew Lutomirski wrote:
>> On Tue, Jan 17, 2012 at 9:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>>> On 01/17, Andrew Lutomirski wrote:
>>>>
>>>> (is_compat_task says whether the executable was marked as 32-bit. ïThe
>>>> actual execution mode is determined by the cs register, which the user
>>>> can control.
>>>
>>> Confused... Afaics, TIF_IA32 says that the binary is 32-bit (this comes
>>> along with TS_COMPAT).
>>>
>>> TS_COMPAT says that, say, the task did "int 80" to enters the kernel.
>>> 64-bit or not, we should treat is as 32-bit in this case.
>>
>> I think you're right, and checking which entry was used is better than
>> checking the cs register (since 64-bit code can use int80). ÂThat's
>> what I get for insufficiently careful reading of the assembly. Â(And
>> for going from memory from when I wrote the vsyscall emulation code --
>> that code is entered from a page fault, so the entry point used is
>> irrelevant.)
>
> Wait: If a tasks is set to 64 bit mode, but calls into the kernel via
> int 0x80 it's changed to 32 bit mode for that system call and back to
> 64 bit mode when the system call is finished!?
>
> Our ptrace jailer is checking cs to figure out if a task is a compat task
> or not, if the kernel can change that behind our back it means our jailer
> isn't secure for x86_64 with compat enabled. Or is cs changed before the
> ptrace stuff and ptrace sees the "right" cs value? If not, we have to add
> an expensive PTRACE_PEEKTEXT to check if it's an int 0x80 or not. Or is
> there another way?

I don't know what your ptrace jailer does. But a task can switch
itself between 32-bit and 64-bit execution at will, and there's
nothing the kernel can do about it. (That isn't quite true -- in
theory the kernel could fiddle with the GDT, but that would be
expensive and wouldn't work on Xen.)

That being said, is_compat_task is apparently a good indication of
whether the current *syscall* entry is a 64-bit syscall or a 32-bit
syscall. Perhaps the function should be renamed to in_compat_syscall,
because that's what it does.

>
> I think this behaviour is so unexpected that it can only cause security
> problems in the long run. Is anyone counting on this? Where is this
> behaviour documented?

Nowhere, I think.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/