RE: [PATCH] net: Remove branch in csum_shift()

From: David Laight
Date: Tue Mar 01 2022 - 06:41:13 EST


From: Christophe Leroy
> Sent: 01 March 2022 11:15
...
> Looks like ARM also does better code with the generic implementation as
> it seems to have some looking like conditional instructions 'rorne' and
> 'strne'.

In arm32 (and I think arm64) every instruction is conditional.

> static __always_inline __wsum csum_shift(__wsum sum, int offset)
> {
> /* rotate sum to align it with a 16b boundary */
> if (offset & 1)
> 1d28: e2102001 ands r2, r0, #1
> 1d2c: e58d3004 str r3, [sp, #4]
> * @word: value to rotate
> * @shift: bits to roll
> */
> static inline __u32 ror32(__u32 word, unsigned int shift)
> {
> return (word >> (shift & 31)) | (word << ((-shift) & 31));
> 1d30: 11a03463 rorne r3, r3, #8
> 1d34: 158d3004 strne r3, [sp, #4]
> if (unlikely(iov_iter_is_pipe(i)))

There is a spare 'str' that a minor code change would
probably remove.
Likely not helped by registers being spilled to stack.

ISTR arm32 having a reasonable number of registers and then
a whole load of them being stolen by the implementation.
(I'm sure I remember stack limit and thread base...)
So the compiler doesn't get that many to play with.

Not quite as bad as nios2 - where r2 and r3 are 'reserved for
the assembler' (as they probably are on MIPS) but the nios2
assembler doesn't ever need to use them!

> ...
> Ok, so the solution would be to have an arch specific version of
> csum_shift() in the same principle as csum_add().

Probably.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)