Re: [PATCH 4/6] scs: Move scs_overflow_check() out of architecture code

From: Kees Cook
Date: Mon May 18 2020 - 11:31:55 EST


On Mon, May 18, 2020 at 02:32:31PM +0100, Mark Rutland wrote:
> On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote:
> > On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote:
> > > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote:
> > > > There is nothing architecture-specific about scs_overflow_check() as
> > > > it's just a trivial wrapper around scs_corrupted().
> > > >
> > > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to
> > > > task_scs_end_corrupted() and call it from schedule_debug() when
> > > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused
> > > > scs_overflow_check() function entirely.
> > > >
> > > > This has absolutely no impact on architectures that do not support SCS
> > > > (currently arm64 only).
> > > >
> > > > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> > >
> > > Pulling this out of arch code seems sane to me, and the arch-specific
> > > chanes look sound. However, I have a concern with the changes within the
> > > scheduler context-switch.
> > >
> > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > index a35d3318492c..56be4cbf771f 100644
> > > > --- a/arch/arm64/kernel/process.c
> > > > +++ b/arch/arm64/kernel/process.c
> > > > @@ -52,7 +52,6 @@
> > > > #include <asm/mmu_context.h>
> > > > #include <asm/processor.h>
> > > > #include <asm/pointer_auth.h>
> > > > -#include <asm/scs.h>
> > > > #include <asm/stacktrace.h>
> > > >
> > > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
> > > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> > > > entry_task_switch(next);
> > > > uao_thread_switch(next);
> > > > ssbs_thread_switch(next);
> > > > - scs_overflow_check(next);
> > >
> > > Prior to this patch, we'd never switch to a task whose SCS had already
> > > been corrupted.
> > >
> > > With this patch, we only check that when switching away from a task, and
> > > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first
> > > glance seems to weaken that.
> >
> > Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with
> > the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y.
> >
> > > Arguably:
> > >
> > > * If the next task's SCS was corrupted by that task while it was
> > > running, we had already lost at that point.
> >
> > With this change, we'll at least catch this one sooner, and that might be
> > useful if a bug has caused us to overflow the SCS but not the main stack.
>
> Sure, but only if CONFIG_SCHED_STACK_END_CHECK is selected.
>
> > > * If the next task's SCS was corrupted by another task, then that could
> > > also happen immediately after the check (though timing to avoid the
> > > check but affect the process could be harder).
> >
> > We're only checking the magic end value, so the cross-task case is basically
> > if you overrun your own SCS as above, but then continue to overrun entire
> > SCSs for other tasks as well. It's probably not very useful in that case.
> >
> > > ... and a VMAP'd SCS would be much nicer in this regard.
> > >
> > > Do we think this is weakening the check, or do we think it wasn't all
> > > that helpful to begin with?
> >
> > I see it as a debug check to catch SCS overflow, rather than a hardening
> > feature, and I agree that using something like vmap stack for the SCS would
> > be better because we could have a guard page instead.
>
> Fair enough. Could we put something into the commit message that more
> explicitly calls out debug-not-hardening? I agree that under that model
> this patch looks fine, and with something to that effect:
>
> Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx>
>
> Mark.
>
> > This is something I would like to revisit, but we need more
> > information from Sami about why Android rejected the larger allocation
> > size, since I don't think there's an awful lot of point merging this
> > series if Android doesn't pick it up.
>
> Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do
> that.

For smaller devices, the memory overhead was too high. (i.e. 4x more
memory allocated to kernel stacks -- 4k vs 1k per thread.) The series
is much more than just a stack exhaustion defense, so I don't think that
detail needs to block the entire series. FWIW, I'd like to have both modes
(contiguous and vmap) available so that system builders can choose their
trade-off. Both will gain return address corruption defense, but the
vmap case will protect against neighboring SCS corruption in the face
of very-unlikely-but-technically-possible stack exhaustion (remember
that with the elimination of VLAs, the stack depth compile time
checking, and the regular stack VMAP guard page, it will be quite
difficult to exhaust the SCS -- either because there is no code path to
accomplish it, or because it would trip the regular stack guard page
first).

--
Kees Cook