Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:[RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

From: Jamie Lokier
Date: Tue Jan 17 2012 - 20:48:51 EST


Indan Zupancic wrote:
> On Tue, January 17, 2012 18:45, Andrew Lutomirski wrote:
> > On Tue, Jan 17, 2012 at 9:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >> On 01/17, Andrew Lutomirski wrote:
> >>>
> >>> (is_compat_task says whether the executable was marked as 32-bit. ïThe
> >>> actual execution mode is determined by the cs register, which the user
> >>> can control.
> >>
> >> Confused... Afaics, TIF_IA32 says that the binary is 32-bit (this comes
> >> along with TS_COMPAT).
> >>
> >> TS_COMPAT says that, say, the task did "int 80" to enters the kernel.
> >> 64-bit or not, we should treat is as 32-bit in this case.
> >
> > I think you're right, and checking which entry was used is better than
> > checking the cs register (since 64-bit code can use int80). That's
> > what I get for insufficiently careful reading of the assembly. (And
> > for going from memory from when I wrote the vsyscall emulation code --
> > that code is entered from a page fault, so the entry point used is
> > irrelevant.)
>
> Wait: If a tasks is set to 64 bit mode, but calls into the kernel via
> int 0x80 it's changed to 32 bit mode for that system call and back to
> 64 bit mode when the system call is finished!?
>
> Our ptrace jailer is checking cs to figure out if a task is a compat task
> or not, if the kernel can change that behind our back it means our jailer
> isn't secure for x86_64 with compat enabled. Or is cs changed before the
> ptrace stuff and ptrace sees the "right" cs value? If not, we have to add
> an expensive PTRACE_PEEKTEXT to check if it's an int 0x80 or not. Or is
> there another way?

PTRACE_PEEKTEXT won't securely tell you if it's int 0x80 if there's
another thread modifying the code, or changing the mappings, or it's
executing from a file or shared memory that someone's writing to.

> I think this behaviour is so unexpected that it can only cause security
> problems in the long run. Is anyone counting on this? Where is this
> behaviour documented?

It's a surprise to me too. And like you I'm using ptrace, to trace
what a process touches, not restrict it, but it's subject to the same problem.

This looks like it needs a kernel patch.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/