[RFC PATCH] x86: optimize IRET returns to kernel

From: Denys Vlasenko
Date: Tue Mar 31 2015 - 08:47:12 EST


This is not proposed to be merged yet.

Andy, this patch is in spirit of your crazy ideas of repurposing
instructions for the roles they weren't intended for :)

Recently I measured IRET timings and was newly "impressed"
how slow it is. 200+ cycles. So I started thinking...

When we return from interrupt/exception *to kernel*,
most of IRET's doings are not necessary. CS and SS
do not need changing. And in many (most?) cases
saved RSP points right at the top of pt_regs,
or (top of pt_regs+8).

In which case we can (ab)use POPF and RET!

Please see the patch.

It has an ifdefed out code which shows that if we could be sure
we aren't on IST stack, the check for stack alignment can be much simpler.
Since this patch is an RFC, I did not remove this bit
as an illustration of some alternatives / future ideas.

I did not measure this, but it must be a win. A big one.

Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
CC: Steven Rostedt <rostedt@xxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: Borislav Petkov <bp@xxxxxxxxx>
CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
CC: Oleg Nesterov <oleg@xxxxxxxxxx>
CC: Frederic Weisbecker <fweisbec@xxxxxxxxx>
CC: Alexei Starovoitov <ast@xxxxxxxxxxxx>
CC: Will Drewry <wad@xxxxxxxxxxxx>
CC: Kees Cook <keescook@xxxxxxxxxxxx>
CC: x86@xxxxxxxxxx
CC: linux-kernel@xxxxxxxxxxxxxxx
---
arch/x86/kernel/entry_64.S | 47 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 020872b..b7ee959 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -750,6 +750,53 @@ retint_kernel:
* The iretq could re-enable interrupts:
*/
TRACE_IRQS_IRETQ
+
+ /*
+ * Since we return to kernel, CS and SS do not need changing.
+ * Only RSP, RIP and RFLAGS do.
+ * We can use POPF + near RET, which is much faster.
+ * The below code may seem excessive, but IRET is _very_ slow.
+ * Hundreds of cycles.
+ *
+ * However, there is a complication. Interrupts in 64-bit mode
+ * align stack to 16 bytes. This changes location
+ * where we need to store EFLAGS and RIP:
+ */
+#if 0
+ testb $8, RSP(%rsp)
+ jnz 1f
+#else
+ /* There is a complication #2: 64-bit mode has IST stacks */
+ leaq SIZEOF_PTREGS+8(%rsp), %rax
+ cmpq %rax, RSP(%rsp)
+ je 1f
+ subq $8, %rax
+ cmpq %rax, RSP(%rsp)
+ jne restore_args /* probably IST stack, can't optimize */
+#endif
+ /* there is no padding above iret frame */
+ movq EFLAGS(%rsp), %rax
+ movq RIP(%rsp), %rcx
+ movq %rax, (SIZEOF_PTREGS-2*8)(%rsp)
+ movq %rcx, (SIZEOF_PTREGS-1*8)(%rsp)
+ CFI_REMEMBER_STATE
+ RESTORE_C_REGS
+ REMOVE_PT_GPREGS_FROM_STACK 4*8 /* remove all except last two words */
+ popfq_cfi
+ retq
+ CFI_RESTORE_STATE
+1: /* there are 8 bytes of padding above iret frame */
+ movq EFLAGS(%rsp), %rax
+ movq RIP(%rsp), %rcx
+ movq %rax, (SIZEOF_PTREGS-2*8 + 8)(%rsp)
+ movq %rcx, (SIZEOF_PTREGS-1*8 + 8)(%rsp)
+ CFI_REMEMBER_STATE
+ RESTORE_C_REGS
+ REMOVE_PT_GPREGS_FROM_STACK 4*8 + 8
+ popfq_cfi
+ retq
+ CFI_RESTORE_STATE
+
restore_args:
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
--
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/