Re: [PATCH 4/6] scs: Move scs_overflow_check() out of architecture code

From: Will Deacon
Date: Mon May 18 2020 - 09:23:56 EST


On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote:
> On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote:
> > There is nothing architecture-specific about scs_overflow_check() as
> > it's just a trivial wrapper around scs_corrupted().
> >
> > For parity with task_stack_end_corrupted(), rename scs_corrupted() to
> > task_scs_end_corrupted() and call it from schedule_debug() when
> > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused
> > scs_overflow_check() function entirely.
> >
> > This has absolutely no impact on architectures that do not support SCS
> > (currently arm64 only).
> >
> > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
>
> Pulling this out of arch code seems sane to me, and the arch-specific
> chanes look sound. However, I have a concern with the changes within the
> scheduler context-switch.
>
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index a35d3318492c..56be4cbf771f 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -52,7 +52,6 @@
> > #include <asm/mmu_context.h>
> > #include <asm/processor.h>
> > #include <asm/pointer_auth.h>
> > -#include <asm/scs.h>
> > #include <asm/stacktrace.h>
> >
> > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
> > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> > entry_task_switch(next);
> > uao_thread_switch(next);
> > ssbs_thread_switch(next);
> > - scs_overflow_check(next);
>
> Prior to this patch, we'd never switch to a task whose SCS had already
> been corrupted.
>
> With this patch, we only check that when switching away from a task, and
> only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first
> glance seems to weaken that.

Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with
the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y.

> Arguably:
>
> * If the next task's SCS was corrupted by that task while it was
> running, we had already lost at that point.

With this change, we'll at least catch this one sooner, and that might be
useful if a bug has caused us to overflow the SCS but not the main stack.

> * If the next task's SCS was corrupted by another task, then that could
> also happen immediately after the check (though timing to avoid the
> check but affect the process could be harder).

We're only checking the magic end value, so the cross-task case is basically
if you overrun your own SCS as above, but then continue to overrun entire
SCSs for other tasks as well. It's probably not very useful in that case.

> ... and a VMAP'd SCS would be much nicer in this regard.
>
> Do we think this is weakening the check, or do we think it wasn't all
> that helpful to begin with?

I see it as a debug check to catch SCS overflow, rather than a hardening
feature, and I agree that using something like vmap stack for the SCS would
be better because we could have a guard page instead. This is something I
would like to revisit, but we need more information from Sami about why
Android rejected the larger allocation size, since I don't think there's an
awful lot of point merging this series if Android doesn't pick it up.

Will