Re: cli/sti vs local_cmpxchg and local_add_return

From: Mathieu Desnoyers
Date: Mon Mar 23 2009 - 12:50:29 EST


* Alan D. Brunelle (Alan.Brunelle@xxxxxx) wrote:
> Here are the results for:
>
> processor : 31
> vendor : GenuineIntel
> arch : IA-64
> family : 32
> model : 0
> model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9050
> revision : 7
> archrev : 0
> features : branchlong, 16-byte atomic ops
> cpu number : 0
> cpu regs : 4
> cpu MHz : 1598.002
> itc MHz : 400.000000
> BogoMIPS : 3186.68
> siblings : 2
> physical id: 196865
> core id : 1
> thread id : 0
>
> test init
> test results: time for baseline
> number of loops: 20000
> total time: 5002
> -> baseline takes 0 cycles
> test end
> test results: time for locked cmpxchg
> number of loops: 20000
> total time: 60083
> -> locked cmpxchg takes 3 cycles
> test end
> test results: time for non locked cmpxchg
> number of loops: 20000
> total time: 60002
> -> non locked cmpxchg takes 3 cycles
> test end
> test results: time for locked add return
> number of loops: 20000
> total time: 155007
> -> locked add return takes 7 cycles
> test end
> test results: time for non locked add return
> number of loops: 20000
> total time: 155004
> -> non locked add return takes 7 cycles
> test end
> test results: time for enabling interrupts (STI)
> number of loops: 20000
> total time: 45003
> -> enabling interrupts (STI) takes 2 cycles
> test end
> test results: time for disabling interrupts (CLI)
> number of loops: 20000
> total time: 59998
> -> disabling interrupts (CLI) takes 2 cycles
> test end
> test results: time for disabling/enabling interrupts (STI/CLI)
> number of loops: 20000
> total time: 107274
> -> enabling/disabling interrupts (STI/CLI) takes 5 cycles
> test end

Hi Alan,

Wow, disabling interrupts is incredibly cheap on the ia64, and
local_add_return especially costly. I think it's because it is done by
an underlying cmpxchg, and therefore not supported directly by the
architecture (except for the fetch add which is limited to very specific
values).

Given some ia64 code refers to NMIs, I guess this architecture supports
them. So in the end, the decision between speed and atomicity will
depend on a solidness vs speed tradeoff. But given the time it takes to
write data to memory, I think 5 cycles vs 10 cycles won't make a big
difference overall.

Thanks for those results !

Mathieu

>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/