Re: [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64

From: H. Peter Anvin
Date: Mon Jul 01 2013 - 10:33:12 EST


Unconditional branches don't need prediction. The branch predictor is used for conditional branches and in some hardware designs for indirect branches. Unconditional direct branches never go through the branch predictor simply because the front end can know with 100% certainty where the flow of control will be.

Borislav Petkov <bp@xxxxxxxxx> wrote:

>On Mon, Jul 01, 2013 at 09:50:46AM +0200, Ingo Molnar wrote:
>> Not sure - the main thing we want to know is whether it gets faster.
>> The _amount_ will depend on things like precise usage patterns,
>> caching, etc. - but rarely does a real workload turn a win like this
>> into a loss.
>
>Yep, and it does get faster by a whopping 6 seconds!
>
>Almost all standard counters go down a bit.
>
>Interestingly, branch misses get a slight increase and the asm goto
>thing does actually jump to the fail_fn from within the asm so maybe
>this could puzzle the branch predictor a bit. Although the instructions
>look the same and jumps are both forward.
>
>Oh well, we don't know where those additional misses happened so it
>could be somewhere else entirely, or it is simply noise.
>
>In any case, we're getting faster, so not worth investigating I guess.
>
>
>plain 3.10
>==========
>
> Performance counter stats for '../build-kernel.sh' (5 runs):
>
>1312558.712266 task-clock # 5.961 CPUs utilized
> ( +- 0.02% )
>1,036,629 context-switches # 0.790 K/sec
>( +- 0.24% )
>55,118 cpu-migrations # 0.042 K/sec (
>+- 0.25% )
>46,505,184 page-faults # 0.035 M/sec
> ( +- 0.00% )
>4,768,420,289,997 cycles # 3.633 GHz
> ( +- 0.02% ) [83.79%]
>3,424,161,066,397 stalled-cycles-frontend # 71.81% frontend cycles
>idle ( +- 0.02% ) [83.78%]
>2,483,143,574,419 stalled-cycles-backend # 52.07% backend cycles
>idle ( +- 0.04% ) [67.40%]
> 3,091,612,061,933 instructions # 0.65 insns per cycle
> # 1.11 stalled cycles per insn ( +- 0.01% ) [83.93%]
>677,787,215,988 branches # 516.386 M/sec
> ( +- 0.01% ) [83.77%]
>25,438,736,368 branch-misses # 3.75% of all branches
> ( +- 0.02% ) [83.78%]
>
>220.191740778 seconds time elapsed
> ( +- 0.32% )
>
> + patch
>========
>
> Performance counter stats for '../build-kernel.sh' (5 runs):
>
>1309995.427337 task-clock # 6.106 CPUs utilized
> ( +- 0.09% )
>1,033,446 context-switches # 0.789 K/sec
>( +- 0.23% )
>55,228 cpu-migrations # 0.042 K/sec (
>+- 0.28% )
>46,484,992 page-faults # 0.035 M/sec
> ( +- 0.00% )
>4,759,631,961,013 cycles # 3.633 GHz
> ( +- 0.09% ) [83.78%]
>3,415,933,806,156 stalled-cycles-frontend # 71.77% frontend cycles
>idle ( +- 0.12% ) [83.78%]
>2,476,066,765,933 stalled-cycles-backend # 52.02% backend cycles
>idle ( +- 0.10% ) [67.38%]
> 3,089,317,073,397 instructions # 0.65 insns per cycle
> # 1.11 stalled cycles per insn ( +- 0.02% ) [83.95%]
>677,623,252,827 branches # 517.271 M/sec
> ( +- 0.01% ) [83.79%]
>25,444,376,740 branch-misses # 3.75% of all branches
> ( +- 0.02% ) [83.79%]
>
>214.533868029 seconds time elapsed
> ( +- 0.36% )

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/