Re: RFC: Petition Intel/AMD to add POPF_IF insn

From: Denys Vlasenko
Date: Thu Aug 18 2016 - 09:26:26 EST

Next message: Joe Perches: "Re: [PATCH] proc, smaps: reduce printing overhead"
Previous message: Josh Poimboeuf: "[PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception'"
In reply to: Linus Torvalds: "Re: RFC: Petition Intel/AMD to add POPF_IF insn"
Next in thread: Linus Torvalds: "Re: RFC: Petition Intel/AMD to add POPF_IF insn"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Of course, somebody really should do timings on modern CPU's (in cpl0,
comparing native_fl() that enables interrupts with a popf)

I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
if we set iopl(3) first.

Surprisingly, STI is slower than CLI. A loop with 27 CLI's and one STI
converges to about ~0.5 insn/cycle:

# compile with: gcc -nostartfiles -nostdlib
_start: .globl _start
mov $172, %eax #iopl
mov $3, %edi
syscall
mov $200*1000*1000, %eax
.balign 64
loop:
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli

cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;sti
dec %eax
jnz loop

mov $231, %eax #exit_group
syscall

perf stat:
6,015,787,968 instructions # 0.52 insn per cycle
3.355474199 seconds time elapsed

With all CLIs replaced by STIs, it's ~0.25 insn/cycle:

6,030,530,328 instructions # 0.27 insn per cycle
6.547200322 seconds time elapsed

POPF which needs to enable interrupts is not measurably faster than
one which does not change .IF:

Loop with:
400158: fa cli
400159: 53 push %rbx #saved eflags with if=1
40015a: 9d popfq
shows:
8,908,857,324 instructions # 0.11 insn per cycle ( +- 0.00% )

Loop with:
400140: fb sti
400141: 53 push %rbx
400142: 9d popfq
shows:
8,920,243,701 instructions # 0.10 insn per cycle ( +- 0.01% )

Even loop with neither CLI nor STI, only with POPF:
400140: 53 push %rbx
400141: 9d popfq
shows:
6,079,936,714 instructions # 0.10 insn per cycle ( +- 0.00% )

This is on a Skylake CPU.

The gist of it:
CLI is 2 cycles,
STI is 4 cycles,
POPF is 10 cycles
seemingly regardless of prior value of EFLAGS.IF.

Next message: Joe Perches: "Re: [PATCH] proc, smaps: reduce printing overhead"
Previous message: Josh Poimboeuf: "[PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception'"
In reply to: Linus Torvalds: "Re: RFC: Petition Intel/AMD to add POPF_IF insn"
Next in thread: Linus Torvalds: "Re: RFC: Petition Intel/AMD to add POPF_IF insn"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]