Re: faster strcpy()

Richard B. Johnson (root@chaos.analogic.com)
Fri, 24 Apr 1998 11:49:05 -0400 (EDT)


On Fri, 24 Apr 1998, Adam Heath wrote:

> On Fri, 24 Apr 1998, Meelis Roos wrote:
>
> > > while(*a++=*b++); perhaps?
> >
> > No, that's how it was before. This copies byte at time and is slow.
> > memcpy is fast. If we could use the same technique as memcpy uses
> > to copy strings and at the same time check the terminating 0...
> > I don't see the answer myself, does anybody see?
>
> I don't know c or i386 assembler, but there are string commands. Would one of
> them do it? I don't have my assembler book handy. Is there a store and
> continue while zero?
>

No. Typically, the length of the string is found first. Then the copy
is made __including__ the terminating null.

Note this is all Intel, not gcc, mnemonics; dest <- source.

les edi,[bp.source] ; es:edi points to source
mov ecx,-1
xor al,al ; AL = 0
repnz scasb ; Scan until a null
not ecx ; String length + 1

;
les edi,[bp.destination] ; es:edi points to destination
lds esi,[bp.source] ; ds:esi points to source
shr ecx,1 ; Make WORD count, sets carry
rep movsw ; Copy words
adc ecx,ecx ; ecx = 1 if odd byte count
rep movsb ; Last byte if any

In generally, checking for a terminating character while copying is
not efficient:

les edi,[bp.destination]
lds esi,[bp.source]
cpy: lodsb ; Get byte
stosb ; Store byte
or al,al ; Check for null
jnz cpy ; Continue.

The code is short, but the jump instruction hurts caching and slows
things down.

If you can copy most of the string as longwords, you gain a lot.
However the logic necessary to do this takes its toll.

<compicated logic to show where to start>

cpy: mov eax, [ebx] ; Get memory long-word
lea ebx, [ebx+4] ; Ready next source location
mov [edi], eax ; Store longword in destination
lea edi, [edi+4] ; Ready next destination
sub ecx,4 ; Adjust byte count
jnz cpy ; I presume that this will occur

<additional logic to finish the string>

If additional logic shows that loop unrolling will be efficient, one
could do something like this:

cpy: mov eax, [ebx]
mov [edi], eax
mov eax, [ebx+4]
mov [edi+4], eax
mov eax, [ebx+8]
mov [edi+8],eax
mov eax, [ebx+12]
mov [edi+12], eax
lea edi, [edi+16]
lea ebx, [ebx+16]
sub ecx, 12
jnz cpy

Directly using the built-in Intel macros such as:

rep movsb
rep movsw
rep movslw
.... etc

is not the most efficient way unless the strings are very short. Using
cache-aligned long-word instructions in which register operations can
occur at the same time memory accesses are happening, will be most
efficient..

The new glibc "knows" about this stuff. Also the kernel code "knows"
about this stuff.

There is no way at all that the macro first presented in this thread
could ever be faster than the glibc code. Further, the macro fails
to copy the terminating null. In addition, the macro will not expand
properly in some of the code where strcpy() is used, because of the
lack of parenthesis. Also memcpy() returns a void pointer, strcpy()
returns a char pointer so string operations that attempt to index
the return value will fail.

Cheers,
Dick Johnson
***** FILE SYSTEM MODIFIED *****
Penguin : Linux version 2.1.92 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu