Re: [RFC PATCH v2 17/27] x86/cet/shstk: User-mode shadow stack support

From: Andy Lutomirski
Date: Wed Jul 11 2018 - 17:34:16 EST




> On Jul 11, 2018, at 2:10 PM, Jann Horn <jannh@xxxxxxxxxx> wrote:
>
>> On Tue, Jul 10, 2018 at 3:31 PM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:
>>
>> This patch adds basic shadow stack enabling/disabling routines.
>> A task's shadow stack is allocated from memory with VM_SHSTK
>> flag set and read-only protection. The shadow stack is
>> allocated to a fixed size.
>>
>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> [...]
>> diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
>> new file mode 100644
>> index 000000000000..96bf69db7da7
>> --- /dev/null
>> +++ b/arch/x86/kernel/cet.c
> [...]
>> +static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
>> +{
>> + struct mm_struct *mm = current->mm;
>> + unsigned long populate;
>> +
>> + down_write(&mm->mmap_sem);
>> + addr = do_mmap(NULL, addr, len, PROT_READ,
>> + MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
>> + 0, &populate, NULL);
>> + up_write(&mm->mmap_sem);
>> +
>> + if (populate)
>> + mm_populate(addr, populate);
>> +
>> + return addr;
>> +}
>
> How does this interact with UFFDIO_REGISTER?
>
> Is there an explicit design decision on whether FOLL_FORCE should be
> able to write to shadow stacks? I'm guessing the answer is "yes,
> FOLL_FORCE should be able to write to shadow stacks"? It might make
> sense to add documentation for this.

FOLL_FORCE should be able to write them, IMO. Otherwise weâll need a whole new debugging API.

By the time an attacker can do FOLL_FORCE writes, the attacker can directly modify *text*, and CET is useless. We should probably audit all uses of FOLL_FORCE and remove as many as we can get away with.

>
> Should the kernel enforce that two shadow stacks must have a guard
> page between them so that they can not be directly adjacent, so that
> if you have too much recursion, you can't end up corrupting an
> adjacent shadow stack?

I think the answer is a qualified ânoâ. I would like to instead enforce a general guard page on all mmaps that donât use MAP_FORCE. We *might* need to exempt any mmap with an address hint for compatibility.

My commercial software has been manually adding guard pages on every single mmap done by tcmalloc for years, and it has caught a couple bugs and costs essentially nothing.

Hmm. Linux should maybe add something like Windowsâ âreservedâ virtual memory. Itâs basically a way to ask for a VA range that explicitly contains nothing and can be subsequently be turned into something useful with the equivalent of MAP_FORCE.

>
>> +int cet_setup_shstk(void)
>> +{
>> + unsigned long addr, size;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
>> + return -EOPNOTSUPP;
>> +
>> + size = in_ia32_syscall() ? SHSTK_SIZE_32:SHSTK_SIZE_64;
>> + addr = shstk_mmap(0, size);
>> +
>> + /*
>> + * Return actual error from do_mmap().
>> + */
>> + if (addr >= TASK_SIZE_MAX)
>> + return addr;
>> +
>> + set_shstk_ptr(addr + size - sizeof(u64));
>> + current->thread.cet.shstk_base = addr;
>> + current->thread.cet.shstk_size = size;
>> + current->thread.cet.shstk_enabled = 1;
>> + return 0;
>> +}
> [...]
>> +void cet_disable_free_shstk(struct task_struct *tsk)
>> +{
>> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
>> + !tsk->thread.cet.shstk_enabled)
>> + return;
>> +
>> + if (tsk == current)
>> + cet_disable_shstk();
>> +
>> + /*
>> + * Free only when tsk is current or shares mm
>> + * with current but has its own shstk.
>> + */
>> + if (tsk->mm && (tsk->mm == current->mm) &&
>> + (tsk->thread.cet.shstk_base)) {
>> + vm_munmap(tsk->thread.cet.shstk_base,
>> + tsk->thread.cet.shstk_size);
>> + tsk->thread.cet.shstk_base = 0;
>> + tsk->thread.cet.shstk_size = 0;
>> + }
>> +
>> + tsk->thread.cet.shstk_enabled = 0;
>> +}