Re: [kernel-hardening] Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP

From: Ard Biesheuvel
Date: Fri Jul 14 2017 - 11:15:49 EST


On 14 July 2017 at 16:03, Robin Murphy <robin.murphy@xxxxxxx> wrote:
> On 14/07/17 15:39, Robin Murphy wrote:
>> On 14/07/17 15:06, Mark Rutland wrote:
>>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote:
>>>> On 14 July 2017 at 11:48, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
>>>>> On 14 July 2017 at 11:32, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote:
>>>
>>>>>>> OK, so here's a crazy idea: what if we
>>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks
>>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack
>>>>>>> size, and map the stack inside it, leaving the remaining space
>>>>>>> unmapped
>>>
>>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate
>>>>>> on XZR rather than SP, so to do this we need to get the SP value into a
>>>>>> GPR.
>>>>>>
>>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence
>>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs.
>>>>>>
>>>>>> However, I now realise I was being thick, since we can stash the GPR
>>>>>> in the SP:
>>>>>>
>>>>>> sub sp, sp, x0 // sp = orig_sp - x0
>>>>>> add x0, sp, x0 // x0 = x0 - (orig_sp - x0) == orig_sp
>>>
>>> That comment is off, and should say x0 = x0 + (orig_sp - x0) == orig_sp
>>>
>>>>>> sub x0, x0, #S_FRAME_SIZE
>>>>>> tb(nz) x0, #THREAD_SHIFT, overflow
>>>>>> add x0, x0, #S_FRAME_SIZE
>>>>>> sub x0, sp, x0
>>>>
>>>> You need a neg x0, x0 here I think
>>>
>>> Oh, whoops. I'd mis-simplified things.
>>>
>>> We can avoid that by storing orig_sp + orig_x0 in sp:
>>>
>>> add sp, sp, x0 // sp = orig_sp + orig_x0
>>> sub x0, sp, x0 // x0 = orig_sp
>>> < check >
>>> sub x0, sp, x0 // x0 = orig_x0
>>
>> Haven't you now forcibly cleared the top bit of x0 thanks to overflow?
>
> ...or maybe not. I still can't quite see it, but I suppose it must
> cancel out somewhere, since Mr. Helpful C Program[1] has apparently
> proven me mistaken :(
>
> I guess that means I approve!
>
> Robin.
>
> [1]:
> #include <assert.h>
> #include <stdint.h>
>
> int main(void) {
> for (int i = 0; i < 256; i++) {
> for (int j = 0; j < 256; j++) {
> uint8_t x = i;
> uint8_t y = j;
> y = y + x;
> x = y - x;
> x = y - x;
> y = y - x;
> assert(x == i && y == j);
> }
> }
> }
>

Yeah, I think the carry out in the first instruction can be ignored,
given that we don't care about the magnitude of the result, only about
the lower 64-bits. The subtraction that inverts it will be off by
exactly 2^64