Re: [RFC] syscalls: Restore address limit after a syscall

From: Kees Cook
Date: Fri Feb 10 2017 - 15:50:35 EST


On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux@xxxxxxxxxxxxxxx> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@xxxxxxxxxx> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm. Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
> arch/arm/kernel/entry-common.S | 6 ++++++
> 1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
> ret_fast_syscall:
> UNWIND(.fnstart )
> UNWIND(.cantunwind )
> + mov r1, #TASK_SIZE
> + str r1, [tsk, #TI_ADDR_LIMIT]
> disable_irq_notrace @ disable interrupts
> ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
> tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> ret_fast_syscall:
> UNWIND(.fnstart )
> UNWIND(.cantunwind )
> + mov r1, #TASK_SIZE
> + str r1, [tsk, #TI_ADDR_LIMIT]
> str r0, [sp, #S_R0 + S_OFF]! @ save returned r0
> disable_irq_notrace @ disable interrupts
> ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> b ret_slow_syscall
>
> __sys_trace_return:
> + mov r1, #TASK_SIZE
> + str r1, [tsk, #TI_ADDR_LIMIT]
> str r0, [sp, #S_R0 + S_OFF]! @ save returned r0
> mov r0, sp
> bl syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
mm_segment_t fs = get_fs();
set_fs(KERNEL_DS);
err = sys_semtimedop(semid, sops, nsops, timeout);
set_fs(fs);
...
return err;
}

Is there a similarly good place to do this for arm64?

-Kees

--
Kees Cook
Pixel Security