From: Christophe JAILLET
Sent: 29 July 2022 21:29...
Most of the time the 'min' and 'max' parameters of usleep_range() are
constant. We can take advantage of it to pre-compute at compile time
some values otherwise computer at run-time in usleep_range_state().
Replace usleep_range_state() by a new __nsleep_range_delta_state() function
that takes as parameters the pre-computed values.
The main benefit is to save a few instructions, especially 2
multiplications (x1000 when converting us to ns).
53 push %rbx...
48 89 fb mov %rdi,%rbx
81 e5 cc 00 00 00 and $0xcc,%ebp
- 49 29 dc sub %rbx,%r12 ; (max - min)
- 4d 69 e4 e8 03 00 00 imul $0x3e8,%r12,%r12 ; us --> ns (x 1000)
48 83 ec 68 sub $0x68,%rsp
48 c7 44 24 08 b3 8a movq $0x41b58ab3,0x8(%rsp)
b5 41
@@ -10721,18 +10719,16 @@
31 c0 xor %eax,%eax
e8 00 00 00 00 call ...
e8 00 00 00 00 call ...
- 49 89 c0 mov %rax,%r8
- 48 69 c3 e8 03 00 00 imul $0x3e8,%rbx,%rax ; us --> ns (x 1000)
+ 48 01 d8 add %rbx,%rax
+ 48 89 44 24 28 mov %rax,0x28(%rsp)
65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx
00 00
- 4c 01 c0 add %r8,%rax
- 48 89 44 24 28 mov %rax,0x28(%rsp)
e8 00 00 00 00 call ...
Is that really measurable in any test?
Integer multiply is one clock on almost every modern cpu.
By the time you've allowed for superscaler cpu there is
probably no difference at all on anything except the simplest
cpus.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)