Re: [PATCH v3] x86/lib: Optimize 8x loop and memory clobbers in csum_partial.c

From: Eric Dumazet
Date: Sat Nov 27 2021 - 01:53:34 EST


On Fri, Nov 26, 2021 at 10:39 PM Noah Goldstein <goldstein.w.n@xxxxxxxxx> wrote:
>
> Modify the 8x loop to that it uses two independent
> accumulators. Despite adding more instructions the latency and
> throughput of the loop is improved because the `adc` chains can now
> take advantage of multiple execution units.

Oh well, there was really no need to resend this, especially if you do
not add my ack.

Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>