Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:[RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

From: Indan Zupancic
Date: Wed Jan 18 2012 - 20:07:27 EST


On Wed, January 18, 2012 02:01, Andrew Lutomirski wrote:
> On Tue, Jan 17, 2012 at 4:56 PM, Indan Zupancic <indan@xxxxxx> wrote:
>> On Tue, January 17, 2012 18:45, Andrew Lutomirski wrote:
>>> On Tue, Jan 17, 2012 at 9:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>>>> On 01/17, Andrew Lutomirski wrote:
>>>>>
>>>>> (is_compat_task says whether the executable was marked as 32-bit. ïThe
>>>>> actual execution mode is determined by the cs register, which the user
>>>>> can control.
>>>>
>>>> Confused... Afaics, TIF_IA32 says that the binary is 32-bit (this comes
>>>> along with TS_COMPAT).
>>>>
>>>> TS_COMPAT says that, say, the task did "int 80" to enters the kernel.
>>>> 64-bit or not, we should treat is as 32-bit in this case.
>>>
>>> I think you're right, and checking which entry was used is better than
>>> checking the cs register (since 64-bit code can use int80). ÂThat's
>>> what I get for insufficiently careful reading of the assembly. Â(And
>>> for going from memory from when I wrote the vsyscall emulation code --
>>> that code is entered from a page fault, so the entry point used is
>>> irrelevant.)
>>
>> Wait: If a tasks is set to 64 bit mode, but calls into the kernel via
>> int 0x80 it's changed to 32 bit mode for that system call and back to
>> 64 bit mode when the system call is finished!?
>>
>> Our ptrace jailer is checking cs to figure out if a task is a compat task
>> or not, if the kernel can change that behind our back it means our jailer
>> isn't secure for x86_64 with compat enabled. Or is cs changed before the
>> ptrace stuff and ptrace sees the "right" cs value? If not, we have to add
>> an expensive PTRACE_PEEKTEXT to check if it's an int 0x80 or not. Or is
>> there another way?
>
> I don't know what your ptrace jailer does. But a task can switch
> itself between 32-bit and 64-bit execution at will, and there's
> nothing the kernel can do about it. (That isn't quite true -- in
> theory the kernel could fiddle with the GDT, but that would be
> expensive and wouldn't work on Xen.)

That's why we don't cache the CS value but check it for every system call.
But you said elsewhere that checking CS isn't always correct either.
I grepped arch/x86 for "user_64bit_mode", but couldn't find anything,
but maybe my kernel sources are too old, I haven't updated this system
for almost a year. The current code only handles 0x23 and 0x33 and kills
the jail if it encounters anything else.

> That being said, is_compat_task is apparently a good indication of
> whether the current *syscall* entry is a 64-bit syscall or a 32-bit
> syscall. Perhaps the function should be renamed to in_compat_syscall,
> because that's what it does.

That seems like a good idea.

>
>>
>> I think this behaviour is so unexpected that it can only cause security
>> problems in the long run. Is anyone counting on this? Where is this
>> behaviour documented?
>
> Nowhere, I think.

Such is life.

Greetings,

Indan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/