Re: faster strcpy()

Alexander Kjeldaas (astor@guardian.no)
Fri, 24 Apr 1998 19:12:30 +0200


On Fri, Apr 24, 1998 at 11:49:05AM -0400, Richard B. Johnson wrote:
>
> Directly using the built-in Intel macros such as:
>
> rep movsb
> rep movsw
> rep movslw
> .... etc
>
> is not the most efficient way unless the strings are very short. Using
> cache-aligned long-word instructions in which register operations can
> occur at the same time memory accesses are happening, will be most
> efficient..
>
> The new glibc "knows" about this stuff. Also the kernel code "knows"
> about this stuff.
>

But this isn't true on all processors. It is true on the pentium, but
probably not on the pentium pro/II. On the pentium pro, rep movsl is
highly optimized microcode. It takes over the whole microarchitecture
and utilizes all possible instruction units (that's why you can't run
other instructions in parallell with a 'rep' instruction on a pentium
pro). I haven't checked this, but take Andy Glew's word for it :-).

astor

-- 
 Alexander Kjeldaas, Guardian Networks AS, Trondheim, Norway
 http://www.guardian.no/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu