Re: [01/18] x86/asm/64: Remove the restore_c_regs_and_iret label

From: kemi
Date: Fri Nov 10 2017 - 01:09:54 EST


Some performance regression/improvement is reported by LKP-tools for this patch series
tested with Intel Atom processor. So, post the data here for your reference.

Branch:x86/entry_consolidation
Commit id:
base:50da9d439392fdd91601d36e7f05728265bff262
head:69af865668fdb86a95e4e948b1f48b2689d60b73
Benchmark suite:will-it-scale
Download link:https://github.com/antonblanchard/will-it-scale/tree/master/tests
Metrics:
will-it-scale.per_process_ops=processes/nr_cpu
will-it-scale.per_thread_ops=threads/nr_cpu

tbox:lkp-avoton3(nr_cpu=8,memory=16G)
CPU: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz
Performance regression with will-it-scale benchmark suite:
testcase base change head metric
eventfd1 1505677 -5.9% 1416132 will-it-scale.per_process_ops
1352716 -3.0% 1311943 will-it-scale.per_thread_ops
lseek2 7306698 -4.3% 6991473 will-it-scale.per_process_ops
4906388 -3.6% 4730531 will-it-scale.per_thread_ops
lseek1 7355365 -4.2% 7046224 will-it-scale.per_process_ops
4928961 -3.7% 4748791 will-it-scale.per_thread_ops
getppid1 8479806 -4.1% 8129026 will-it-scale.per_process_ops
8515252 -4.1% 8162076 will-it-scale.per_thread_ops
lock1 1054249 -3.2% 1020895 will-it-scale.per_process_ops
989145 -2.6% 963578 will-it-scale.per_thread_ops
dup1 2675825 -3.0% 2596257 will-it-scale.per_process_ops
futex3 4986520 -2.8% 4846640 will-it-scale.per_process_ops
5009388 -2.7% 4875126 will-it-scale.per_thread_ops
futex4 3932936 -2.0% 3854240 will-it-scale.per_process_ops
3950138 -2.0% 3872615 will-it-scale.per_thread_ops
futex1 2941886 -1.8% 2888912 will-it-scale.per_process_ops
futex2 2500203 -1.6% 2461065 will-it-scale.per_process_ops
1534692 -2.3% 1499532 will-it-scale.per_thread_ops
malloc1 61314 -1.0% 60725 will-it-scale.per_process_ops
19996 -1.5% 19688 will-it-scale.per_thread_ops

Performance improvement with will-it-scale benchmark suite:
testcase base change head metric
context_switch1 176376 +1.6% 179152 will-it-scale.per_process_ops
180703 +1.9% 184209 will-it-scale.per_thread_ops
page_fault2 179716 +2.5% 184272 will-it-scale.per_process_ops
146890 +2.8% 150989 will-it-scale.per_thread_ops
page_fault3 666953 +3.7% 691735 will-it-scale.per_process_ops
464641 +5.0% 487952 will-it-scale.per_thread_ops
unix1 483094 +4.4% 504201 will-it-scale.per_process_ops
450055 +7.5% 483637 will-it-scale.per_thread_ops
read2 575887 +5.0% 604440 will-it-scale.per_process_ops
500319 +5.2% 526361 will-it-scale.per_thread_ops
poll1 4614597 +5.4% 4864022 will-it-scale.per_process_ops
3981551 +5.8% 4213409 will-it-scale.per_thread_ops
pwrite2 383344 +5.7% 405151 will-it-scale.per_process_ops
367006 +5.0% 385209 will-it-scale.per_thread_ops
sched_yield 3011191 +6.0% 3191710 will-it-scale.per_process_ops
3024171 +6.1% 3208197 will-it-scale.per_thread_ops
pipe1 755487 +6.2% 802622 will-it-scale.per_process_ops
705136 +8.8% 766950 will-it-scale.per_thread_ops
pwrite3 422850 +6.6% 450660 will-it-scale.per_process_ops
413370 +3.7% 428704 will-it-scale.per_thread_ops
readseek1 972102 +6.7% 1036852 will-it-scale.per_process_ops
844877 +6.6% 900686 will-it-scale.per_thread_ops
pwrite1 981310 +6.8% 1047809 will-it-scale.per_process_ops
944421 +5.7% 998472 will-it-scale.per_thread_ops
pread2 444743 +6.9% 475332 will-it-scale.per_process_ops
430299 +6.1% 456718 will-it-scale.per_thread_ops
writeseek1 849520 +7.0% 908672 will-it-scale.per_process_ops
746978 +9.3% 816372 will-it-scale.per_thread_ops
pread3 1108949 +7.2% 1189021 will-it-scale.per_process_ops
1088521 +5.5% 1148522 will-it-scale.per_thread_ops
mmap1 207314 +7.3% 222442 will-it-scale.per_process_ops
82533 +6.9% 88199 will-it-scale.per_thread_ops
writeseek3 377973 +7.4% 405853 will-it-scale.per_process_ops
333156 +11.4% 371100 will-it-scale.per_thread_ops
open2 266217 +7.6% 286335 will-it-scale.per_process_ops
208208 +6.6% 222052 will-it-scale.per_thread_ops
unlink2 54774 +7.7% 59013 will-it-scale.per_process_ops
53792 +7.0% 57584 will-it-scale.per_thread_ops
poll2 257458 +8.0% 278072 will-it-scale.per_process_ops
153400 +8.4% 166256 will-it-scale.per_thread_ops
posix_semaphore1 19898603 +8.3% 21552049 will-it-scale.per_process_ops
19797092 +8.4% 21458395 will-it-scale.per_thread_ops
pthread_mutex2 35871102 +8.4% 38868017 will-it-scale.per_process_ops
21506625 +8.4% 23312550 will-it-scale.per_thread_ops
mmap2 154242 +8.5% 167348 will-it-scale.per_process_ops
62234 +7.4% 66841 will-it-scale.per_thread_ops
unlink1 31487 +9.3% 34404 will-it-scale.per_process_ops
31607 +8.5% 34285 will-it-scale.per_thread_ops
open1 280301 +9.9% 307995 will-it-scale.per_process_ops
213863 +7.8% 230585 will-it-scale.per_thread_ops
signal1 355247 +11.2% 394875 will-it-scale.per_process_ops
176973 +9.7% 194160 will-it-scale.per_thread_ops

============================================================================================
Branch:x86/entry_consolidation
Commit id:
base:50da9d439392fdd91601d36e7f05728265bff262
head:69af865668fdb86a95e4e948b1f48b2689d60b73
Benchmark suite:unixbench
Download link:https://github.com/kdlucas/byte-unixbench.git

tbox:lkp-avoton2(nr_cpu=8,memory=16G)
CPU: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz
Performance regression with unixbench benchmark suite:
testcase base change head metric
syscall 1206 -4.2% 1155 unixbench.score
pipe 4851 -1.5% 4779 unixbench.score
execl 498.83 -1.2% 492.90 unixbench.score

Performance improvement with unixbench benchmark suite:
testcase base change head metric
fsdisk 2150 +2.7% 2208 unixbench.score

=============================================================================================

On 2017å10æ26æ 16:26, Andrew Lutomirski wrote:
> The only user was the 64-bit opportunistic SYSRET failure path, and
> that path didn't really need it. This change makes the
> opportunistic SYSRET code a bit more straightforward and gets rid of
> the label.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Reviewed-by: Borislav Petkov <bp@xxxxxxx>
> ---
> arch/x86/entry/entry_64.S | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 49167258d587..afe1f403fa0e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -245,7 +245,6 @@ entry_SYSCALL64_slow_path:
> call do_syscall_64 /* returns with IRQs disabled */
>
> return_from_SYSCALL_64:
> - RESTORE_EXTRA_REGS
> TRACE_IRQS_IRETQ /* we're about to change IF */
>
> /*
> @@ -314,6 +313,7 @@ return_from_SYSCALL_64:
> */
> syscall_return_via_sysret:
> /* rcx and r11 are already restored (see code above) */
> + RESTORE_EXTRA_REGS
> RESTORE_C_REGS_EXCEPT_RCX_R11
> movq RSP(%rsp), %rsp
> UNWIND_HINT_EMPTY
> @@ -321,7 +321,7 @@ syscall_return_via_sysret:
>
> opportunistic_sysret_failed:
> SWAPGS
> - jmp restore_c_regs_and_iret
> + jmp restore_regs_and_iret
> END(entry_SYSCALL_64)
>
> ENTRY(stub_ptregs_64)
> @@ -638,7 +638,6 @@ retint_kernel:
> */
> GLOBAL(restore_regs_and_iret)
> RESTORE_EXTRA_REGS
> -restore_c_regs_and_iret:
> RESTORE_C_REGS
> REMOVE_PT_GPREGS_FROM_STACK 8
> INTERRUPT_RETURN
>