RE: [PATCH] LoongArch: add checksum optimization for 64-bit system

From: David Laight
Date: Wed Feb 08 2023 - 09:19:59 EST


From: WANG Xuerui
> Sent: 08 February 2023 13:48
...
> Yeah LoongArch can do rotates, and your suggestion can indeed reduce one
> insn from every invocation of csum_fold.
>
> From this:
>
> 000000000000096c <csum_fold>:
> sum += (sum >> 16) | (sum << 16);
> 96c: 004cc08c rotri.w $t0, $a0, 0x10
> 970: 00101184 add.w $a0, $t0, $a0
> return ~(__force __sum16)(sum >> 16);
> 974: 0044c084 srli.w $a0, $a0, 0x10
> 978: 00141004 nor $a0, $zero, $a0
> }
> 97c: 006f8084 bstrpick.w $a0, $a0, 0xf, 0x0
> 980: 4c000020 jirl $zero, $ra, 0
>
> To:
>
> 0000000000000984 <csum_fold2>:
> return (~sum - rol32(sum, 16)) >> 16;
> 984: 0014100c nor $t0, $zero, $a0
> return (x << amt) | (x >> (32 - amt));
> 988: 004cc084 rotri.w $a0, $a0, 0x10
> return (~sum - rol32(sum, 16)) >> 16;
> 98c: 00111184 sub.w $a0, $t0, $a0
> }
> 990: 00df4084 bstrpick.d $a0, $a0, 0x1f, 0x10
> 994: 4c000020 jirl $zero, $ra, 0

It is actually slightly better than that.
In the csum_fold2 version the first two instructions
are independent - so can execute in parallel on some cpu.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)