Re: [PATCH v1] x86/lib: Optimize 8x loop and memory clobbers in csum_partial.c

From: Noah Goldstein
Date: Fri Nov 26 2021 - 19:42:14 EST


On Fri, Nov 26, 2021 at 6:15 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Fri, Nov 26, 2021 at 12:33 PM Noah Goldstein <goldstein.w.n@xxxxxxxxx> wrote:
> >
> > On Fri, Nov 26, 2021 at 2:07 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Nov 26, 2021 at 11:50 AM Noah Goldstein <goldstein.w.n@xxxxxxxxx> wrote:
> > > >
> > > > Bright :) but it will need a BMI support check.
> > >
> > > Yes, probably not worth the pain.
> >
> > Making a V2 for my patch with your optimization for the loop case. Do you think
> > 1 or 2 accum for the 32 byte case?
> >
>
> I would vote for something simpler, thus one accum, since this 32byte
> block is only run one time ?

If the one at a time performance is whats the most important wouldn't that
argue in favor of 2x accum because it lead to decreased latency? Or are you
saying it's not that important so simpler codes is the priority?

>
> Thanks !