Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:[RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

From: Indan Zupancic
Date: Tue Jan 17 2012 - 23:22:48 EST


On Wed, January 18, 2012 03:22, Andi Kleen wrote:
>> I'm pretty sure this isn't about changing cs or far jumps
>
> He's assuming that code can only run on two code segments and
> not arbitarily switch between them which is a completely incorrect
> assumption.

All I assumed up to now was that cs shows the current mode of the process,
and that that defines which system call path is taken. Apparently that is
not true and int 0x80 forces the compat system call path.

Looking at EIP - 2 seems like a secure way to check how we entered the kernel.

>> I think Indan means code is running with 64-bit cs, but the kernel
>> treats int $0x80 as a 32-bit syscall and sysenter as a 64-bit syscall,
>> and there's no way for the ptracer to know which syscall the kernel
>> will perform, even by looking at all registers.

Yes, that's what I meant.

>> It looks like a hole in ptrace which could be fixed.
>
> Possibly, but anything that bases its security on ptrace is typically
> unfixable racy (just think what happens with multiple threads
> and syscall arguments), so it's unlikely to do any good.

As far as I know, we fixed all races except symlink races caused by malicious
code outside the jail. Those are controllable by limiting what filesystem access
the prisoners get. A special open() flag which causes open to fail when a part
of the path is a symlink with a distinguishable error code would solve this for
us.

Other than that and the abysmal performance, ptrace is fine for jailing.

Greetings,

Indan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/