Re: [PATCH] LoongArch: add checksum optimization for 64-bit system

From: maobibo
Date: Wed Feb 08 2023 - 20:16:36 EST




在 2023/2/8 22:19, David Laight 写道:
> From: WANG Xuerui
>> Sent: 08 February 2023 13:48
> ...
>> Yeah LoongArch can do rotates, and your suggestion can indeed reduce one
>> insn from every invocation of csum_fold.
>>
>> From this:
>>
>> 000000000000096c <csum_fold>:
>> sum += (sum >> 16) | (sum << 16);
>> 96c: 004cc08c rotri.w $t0, $a0, 0x10
>> 970: 00101184 add.w $a0, $t0, $a0
>> return ~(__force __sum16)(sum >> 16);
>> 974: 0044c084 srli.w $a0, $a0, 0x10
>> 978: 00141004 nor $a0, $zero, $a0
>> }
>> 97c: 006f8084 bstrpick.w $a0, $a0, 0xf, 0x0
>> 980: 4c000020 jirl $zero, $ra, 0
>>
>> To:
>>
>> 0000000000000984 <csum_fold2>:
>> return (~sum - rol32(sum, 16)) >> 16;
>> 984: 0014100c nor $t0, $zero, $a0
>> return (x << amt) | (x >> (32 - amt));
>> 988: 004cc084 rotri.w $a0, $a0, 0x10
>> return (~sum - rol32(sum, 16)) >> 16;
>> 98c: 00111184 sub.w $a0, $t0, $a0
>> }
>> 990: 00df4084 bstrpick.d $a0, $a0, 0x1f, 0x10
>> 994: 4c000020 jirl $zero, $ra, 0
>
> It is actually slightly better than that.
> In the csum_fold2 version the first two instructions
> are independent - so can execute in parallel on some cpu.
>
> David
>

Thanks for the good suggestion.
Will send the second version soon.

regards
bibo,mao
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)