Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT

From: Yury Norov
Date: Mon Dec 07 2020 - 21:00:10 EST


(CC: Alexey Klimov)

On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote:
> > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
> > enable it in config. It leads to using find_next_bit() which is less
> > efficient:
>
> [...]
>
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1515f6f153a0..2b90ef1f548e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -106,6 +106,7 @@ config ARM64
> > select GENERIC_CPU_AUTOPROBE
> > select GENERIC_CPU_VULNERABILITIES
> > select GENERIC_EARLY_IOREMAP
> > + select GENERIC_FIND_FIRST_BIT
>
> Does this actually make any measurable difference? The disassembly with
> or without this is _very_ similar for me (clang 11).
>
> Will

On A-53 find_first_bit() is almost twice faster than find_next_bit(),
according to
lib/find_bit_benchmark. (Thanks to Alexey for testing.)

Yury

---

Tested-by: Alexey Klimov <aklimov@xxxxxxxxxx>

Start testing find_bit() with random-filled bitmap
[7126084.864616] find_next_bit: 9653351 ns, 164280 iterations
[7126084.881146] find_next_zero_bit: 9591974 ns, 163401 iterations
[7126084.893859] find_last_bit: 5778627 ns, 164280 iterations
[7126084.948181] find_first_bit: 47389224 ns, 16357 iterations
[7126084.958975] find_next_and_bit: 3875849 ns, 73487 iterations
[7126084.965884]
Start testing find_bit() with sparse bitmap
[7126084.973474] find_next_bit: 109879 ns, 655 iterations
[7126084.999365] find_next_zero_bit: 18968440 ns, 327026 iterations
[7126085.006351] find_last_bit: 80503 ns, 655 iterations
[7126085.032315] find_first_bit: 19048193 ns, 655 iterations
[7126085.039303] find_next_and_bit: 82628 ns, 1 iterations

with enabled GENERIC_FIND_FIRST_BIT:

Start testing find_bit() with random-filled bitmap
[ 84.095335] find_next_bit: 9600970 ns, 163770 iterations
[ 84.111695] find_next_zero_bit: 9613137 ns, 163911 iterations
[ 84.124143] find_last_bit: 5713907 ns, 163770 iterations
[ 84.158068] find_first_bit: 27193319 ns, 16406 iterations
[ 84.168663] find_next_and_bit: 3863814 ns, 73671 iterations
[ 84.175392]
Start testing find_bit() with sparse bitmap
[ 84.182660] find_next_bit: 112334 ns, 656 iterations
[ 84.208375] find_next_zero_bit: 18976981 ns, 327025 iterations
[ 84.215184] find_last_bit: 79584 ns, 656 iterations
[ 84.233005] find_first_bit: 11082437 ns, 656 iterations
[ 84.239821] find_next_and_bit: 82209 ns, 1 iterations

root@pine:~# cpupower -c all frequency-info | grep asserted
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
root@pine:~# lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 4
Model name: Cortex-A53
Stepping: r0p4
CPU max MHz: 1152.0000
CPU min MHz: 648.0000
BogoMIPS: 48.00
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2
crc32 cpuid