Make sure you benchmark both the cached and uncached cases. If it is definitely
a win to use the 3dnow code then use it. Note that there is some stuff
pending (hopefully for 2.4.0) that allows you to plug in multiple memcpy
routines and handle the choice per cpu. That will also allow you to do
finer tuning for the winchip. Right now with the current draft of that code
it has support for
Integer copies (rep movs etc)
MMX + 3Dnow! (mmx with prefetch)
MMX no 3dnow (older mmx cpus)
FPU trick (earlier preventiums)
and more can be added (eg the K6-2 seems to be fastest using integer operations
unrolled, and with prefetch stuff)
Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/