Re: faster strcpy()

Richard B. Johnson (root@chaos.analogic.com)
Fri, 24 Apr 1998 15:18:05 -0400 (EDT)


On Fri, 24 Apr 1998, Alexander Kjeldaas wrote:

> On Fri, Apr 24, 1998 at 11:49:05AM -0400, Richard B. Johnson wrote:
> >
> > Directly using the built-in Intel macros such as:
> >
> > rep movsb
> > rep movsw
> > rep movslw
> > .... etc
> >
> > is not the most efficient way unless the strings are very short. Using
> > cache-aligned long-word instructions in which register operations can
> > occur at the same time memory accesses are happening, will be most
> > efficient..
> >
> > The new glibc "knows" about this stuff. Also the kernel code "knows"
> > about this stuff.
> >
>
> But this isn't true on all processors. It is true on the pentium, but
> probably not on the pentium pro/II. On the pentium pro, rep movsl is
> highly optimized microcode. It takes over the whole microarchitecture
> and utilizes all possible instruction units (that's why you can't run
> other instructions in parallell with a 'rep' instruction on a pentium
> pro). I haven't checked this, but take Andy Glew's word for it :-).
>
> astor
>

Other instructions in 'parallel' ?. Any parallel instructions are
transparent to the instruction-stream. The fact that auto increment.
auto decrement, auto count, and branch prediction may be occurring
makes no difference to the code written for it. It just 'sometimes'
makes it faster. The Intel processors are not like DSPs where you
can partition instruction streams to run in parallel. You get whatever
the CPU gives you.

Cheers,
Dick Johnson
***** FILE SYSTEM MODIFIED *****
Penguin : Linux version 2.1.92 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu