Re: [PATCH v1] x86/lib: Optimize 8x loop and memory clobbers in csum_partial.c

From: Eric Dumazet
Date: Fri Nov 26 2021 - 14:04:44 EST


On Fri, Nov 26, 2021 at 10:17 AM Noah Goldstein <goldstein.w.n@xxxxxxxxx> wrote:
>

>
> Makes sense. Although if you inline I think you definitely will want a more
> conservative clobber than just "memory". Also I think with 40 you also will
> get some value from two counters.
>
> Did you see the number/question I posted about two accumulators for 32
> byte case?
> Its a judgement call about latency vs throughput that I don't really have an
> answer for.
>

The thing I do not know is if using more units would slow down the
hyper thread ?

Would using ADCX/ADOX would be better in this respect ?