Re: checksumming with mmx, comment in arch/i386/lib/mmx.c

Date: Tue Feb 11 2003 - 14:06:42 EST

On Tue, 11 Feb 2003 13:37:07 EST, (nick black) said:

> firstly, to what domain of checksums does this comment apply? secondly,
> why is it true? it seems the PADDW family of instructions could work
> well here; is the slowdown a result of the kernel's need to muck with
> fpu state (from what i can tell, mmx uses the fp registers)?

(Note - second-hand info from somebody else who looked at MMX/SSE to optimize
an inner loop. Double-check with CPU documentation).

There's a big "urp" sound as the processor switches from FP to MMX mode and
back, which apparently takes a large number of cycles. You can to some extent
amortize this if you're switching once for a LONG loop (the analysis I saw was
with a million or so pixels on a screen image) - if you're switching in and out
for a 1500 byte packet (or even worse, a 100-byte packet) the impact may be
more noticable. You may wish to examine the SSE/SSE2 opcodes, which apparently
don't take this performance hit.

				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech

