You mean pmov 8 bytes in word->long expand the 16 bit chunks into a pair
of 64 bit registers and then do a pair of adds. That might be fractionally
faster but remembr the if / add 1 stuff involves no jumps because you can
use the mmx conditional, an and operation and an add.
Also by expanding it you need more registers. You will need two for the
sum, two for the expanded value, and one for the input, which means you can't
have two sets of interleaved loops ?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
Please read the FAQ at http://www.tux.org/lkml/