Re: [PATCH] x86/asm/entry/64: better check for canonical address

From: Andy Lutomirski
Date: Thu Mar 26 2015 - 14:45:47 EST


On Thu, Mar 26, 2015 at 5:42 AM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> This change makes the check exact (no more false positives
> on kernel addresses).
>
> It isn't really important to be fully correct here -
> almost all addresses we'll ever see will be userspace ones,
> but OTOH it looks to be cheap enough:
> the new code uses two more ALU ops but preserves %rcx,
> allowing to not reload it from pt_regs->cx again.
> On disassembly level, the changes are:
>
> cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
> shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
> mov 0x58(%rsp),%rcx -> (eliminated)
>
> Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> CC: Borislav Petkov <bp@xxxxxxxxx>
> CC: x86@xxxxxxxxxx
> CC: linux-kernel@xxxxxxxxxxxxxxx
> ---
>
> Andy, I'd undecided myself on the merits of doing this.
> If you like it, feel free to take it in your tree.
> I trimmed CC list to not bother too many people with this trivial
> and quite possibly "useless churn"-class change.

I suspect that the two added ALU ops are free for all practical
purposes, and the performance of this path isn't *that* critical.

If anyone is running with vsyscall=native because they need the
performance, then this would be a big win. Otherwise I don't have a
real preference. Anyone else have any thoughts here?

Let me just run through the math quickly to make sure I believe all the numbers:

Canonical addresses either start with 17 zeros or 17 ones.

In the old code, we checked that the top (64-47) = 17 bits were all
zero. We did this by shifting right by 47 bits and making sure that
nothing was left.

In the new code, we're shifting left by (64 - 48) = 16 bits and then
signed shifting right by the same amount, this propagating the 17th
highest bit to all positions to its left. If we get the same value we
started with, then we're good to go.

So it looks okay to me.

IOW, the new code extends the optimization correctly to one more case
(native vsyscalls or the really weird corner case of returns to
emulated vsyscalls, although that should basically never happen) at
the cost of two probably-free ALU ops.

--Andy

>
> arch/x86/kernel/entry_64.S | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index bf9afad..a36d04d 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -688,26 +688,27 @@ retint_swapgs: /* return to user-space */
> * a completely clean 64-bit userspace context.
> */
> movq RCX(%rsp),%rcx
> - cmpq %rcx,RIP(%rsp) /* RCX == RIP */
> + movq RIP(%rsp),%r11
> + cmpq %rcx,%r11 /* RCX == RIP */
> jne opportunistic_sysret_failed
>
> /*
> * On Intel CPUs, sysret with non-canonical RCX/RIP will #GP
> * in kernel space. This essentially lets the user take over
> - * the kernel, since userspace controls RSP. It's not worth
> - * testing for canonicalness exactly -- this check detects any
> - * of the 17 high bits set, which is true for non-canonical
> - * or kernel addresses. (This will pessimize vsyscall=native.
> - * Big deal.)
> + * the kernel, since userspace controls RSP.
> *
> - * If virtual addresses ever become wider, this will need
> + * If width of "canonical tail" ever become variable, this will need
> * to be updated to remain correct on both old and new CPUs.
> */
> .ifne __VIRTUAL_MASK_SHIFT - 47
> .error "virtual address width changed -- sysret checks need update"
> .endif
> - shr $__VIRTUAL_MASK_SHIFT, %rcx
> - jnz opportunistic_sysret_failed
> + /* Change top 16 bits to be a sign-extension of the rest */
> + shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
> + sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
> + /* If this changed %rcx, it was not canonical */
> + cmpq %rcx, %r11
> + jne opportunistic_sysret_failed
>
> cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */
> jne opportunistic_sysret_failed
> @@ -730,8 +731,8 @@ retint_swapgs: /* return to user-space */
> */
> irq_return_via_sysret:
> CFI_REMEMBER_STATE
> - /* r11 is already restored (see code above) */
> - RESTORE_C_REGS_EXCEPT_R11
> + /* rcx and r11 are already restored (see code above) */
> + RESTORE_C_REGS_EXCEPT_RCX_R11
> movq RSP(%rsp),%rsp
> USERGS_SYSRET64
> CFI_RESTORE_STATE
> --
> 1.8.1.4
>



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/