Re: [PATCH v2] x86/lib: Optimize 8x loop and memory clobbers in csum_partial.c

From: Eric Dumazet
Date: Sat Nov 27 2021 - 01:05:47 EST


On Fri, Nov 26, 2021 at 8:25 PM Noah Goldstein <goldstein.w.n@xxxxxxxxx> wrote:
>
> Modify the 8x loop to that it uses two independent
> accumulators. Despite adding more instructions the latency and
> throughput of the loop is improved because the `adc` chains can now
> take advantage of multiple execution units.
>
> Make the memory clobbers more precise. 'buff' is read only and we know
> the exact usage range. There is no reason to write-clobber all memory.
>
> Relative performance changes on Tigerlake:
>
> Time Unit: Ref Cycles
> Size Unit: Bytes
>
> size, lat old, lat new, tput old, tput new
> 0, 4.961, 4.901, 4.887, 4.951
> 8, 5.590, 5.620, 4.227, 4.252
> 16, 6.182, 6.202, 4.233, 4.278
> 24, 7.392, 7.380, 4.256, 4.279
> 32, 7.371, 7.390, 4.550, 4.537
> 40, 8.621, 8.601, 4.862, 4.836
> 48, 9.406, 9.374, 5.206, 5.234
> 56, 10.535, 10.522, 5.416, 5.447
> 64, 10.000, 7.590, 6.946, 6.989
> 100, 14.218, 12.476, 9.429, 9.441
> 200, 22.115, 16.937, 13.088, 12.852
> 300, 31.826, 24.640, 19.383, 18.230
> 400, 39.016, 28.133, 23.223, 21.304
> 500, 48.815, 36.186, 30.331, 27.104
> 600, 56.732, 40.120, 35.899, 30.363
> 700, 66.623, 48.178, 43.044, 36.400
> 800, 73.259, 51.171, 48.564, 39.173
> 900, 82.821, 56.635, 58.592, 45.162
> 1000, 90.780, 63.703, 65.658, 48.718
>
> Signed-off-by: Noah Goldstein <goldstein.w.n@xxxxxxxxx>
>
> tmp

SGTM (not sure what this 'tmp' string means here :) )

Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>