Re: [CFT] faster athlon/duron memory copy implementation

From: Dieter Nützel (Dieter.Nuetzel@hamburg.de)
Date: Thu Oct 24 2002 - 15:51:26 EST


Rober Love wrote:
> The majority of the program is inline assembly so I do not think
> compiler is playing a huge role here.

I think they are...

> Regardless, the numbers are all pretty uniform in saying the new no
> prefetch method is superior so its a mute point.

But all "your" numbers are slow.
Look at mine with the "right" (TM) flags ;-)

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) MP 1900+
stepping : 2
cpu MHz : 1600.377
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips : 3145.72

processor : 1
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) MP
stepping : 2
cpu MHz : 1600.377
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips : 3194.88

SuSE Linux 7.3

glibc-2.2.4
Addons: db db2 linuxthreads noversion
Build CFLAGS: -O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
-fschedule-insns2 -fexpensive-optimizations -g
Build CC: gcc
Compiler version: 2.95.3 20010315 (SuSE)

Linux 2.5.43-mm2
Kernel compiler FLAGS
HOSTCC = gcc
HOSTCFLAGS = -Wall -Wstrict-prototypes -O -fomit-frame-pointer -mcpu=k6
-mpreferred-stack-boundary=2 -malign-functions=4 -fschedule-insns2
-fexpensive-optimizations

YES, I only use "-mcpu=k6" and "-O" for ages (since 26. August 1999 ;-) on my
Athlons.

nuetzel/Entwicklung> ./athlon ; ./athlon ; ./athlon
Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run' took 17409 cycles per page
clear_page function '2.4 non MMX' took 12340 cycles per page
clear_page function '2.4 MMX fallback' took 12429 cycles per page
clear_page function '2.4 MMX version' took 9794 cycles per page
clear_page function 'faster_clear_page' took 4639 cycles per page
clear_page function 'even_faster_clear' took 4914 cycles per page

copy_page() tests
copy_page function 'warm up run' took 16506 cycles per page
copy_page function '2.4 non MMX' took 18412 cycles per page
copy_page function '2.4 MMX fallback' took 18468 cycles per page
copy_page function '2.4 MMX version' took 16550 cycles per page
copy_page function 'faster_copy' took 10239 cycles per page
copy_page function 'even_faster' took 10816 cycles per page

Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run' took 17148 cycles per page
clear_page function '2.4 non MMX' took 12426 cycles per page
clear_page function '2.4 MMX fallback' took 12330 cycles per page
clear_page function '2.4 MMX version' took 9776 cycles per page
clear_page function 'faster_clear_page' took 4619 cycles per page
clear_page function 'even_faster_clear' took 4938 cycles per page

copy_page() tests
copy_page function 'warm up run' took 16640 cycles per page
copy_page function '2.4 non MMX' took 18434 cycles per page
copy_page function '2.4 MMX fallback' took 18454 cycles per page
copy_page function '2.4 MMX version' took 16533 cycles per page
copy_page function 'faster_copy' took 10418 cycles per page
copy_page function 'even_faster' took 10707 cycles per page

Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run' took 17475 cycles per page
clear_page function '2.4 non MMX' took 12435 cycles per page
clear_page function '2.4 MMX fallback' took 12379 cycles per page
clear_page function '2.4 MMX version' took 9902 cycles per page
clear_page function 'faster_clear_page' took 4665 cycles per page
clear_page function 'even_faster_clear' took 4947 cycles per page

copy_page() tests
copy_page function 'warm up run' took 16606 cycles per page
copy_page function '2.4 non MMX' took 18439 cycles per page
copy_page function '2.4 MMX fallback' took 18676 cycles per page
copy_page function '2.4 MMX version' took 16560 cycles per page
copy_page function 'faster_copy' took 10239 cycles per page
copy_page function 'even_faster' took 10728 cycles per page

nuetzel/Entwicklung> ./athlon2 ; ./athlon2 ; ./athlon2
1600.061 MHz
clear_page by 'normal_clear_page' took 12463 cycles (501.5 MB/s)
clear_page by 'slow_zero_page' took 12461 cycles (501.6 MB/s)
clear_page by 'fast_clear_page' took 9555 cycles (654.1 MB/s)
clear_page by 'faster_clear_page' took 4436 cycles (1408.7 MB/s)

copy_page by 'normal_copy_page' took 8992 cycles (695.0 MB/s)
copy_page by 'slow_copy_page' took 9010 cycles (693.7 MB/s)
copy_page by 'fast_copy_page' took 8134 cycles (768.3 MB/s)
copy_page by 'faster_copy' took 5546 cycles (1126.8 MB/s)
copy_page by 'even_faster' took 5616 cycles (1112.9 MB/s)

1600.057 MHz
clear_page by 'normal_clear_page' took 12555 cycles (497.8 MB/s)
clear_page by 'slow_zero_page' took 12740 cycles (490.6 MB/s)
clear_page by 'fast_clear_page' took 9783 cycles (638.8 MB/s)
clear_page by 'faster_clear_page' took 4459 cycles (1401.4 MB/s)

copy_page by 'normal_copy_page' took 9123 cycles (685.0 MB/s)
copy_page by 'slow_copy_page' took 9080 cycles (688.3 MB/s)
copy_page by 'fast_copy_page' took 8232 cycles (759.3 MB/s)
copy_page by 'faster_copy' took 5535 cycles (1129.1 MB/s)
copy_page by 'even_faster' took 5565 cycles (1123.1 MB/s)

1600.060 MHz
clear_page by 'normal_clear_page' took 12625 cycles (495.1 MB/s)
clear_page by 'slow_zero_page' took 12541 cycles (498.3 MB/s)
clear_page by 'fast_clear_page' took 9648 cycles (647.8 MB/s)
clear_page by 'faster_clear_page' took 4463 cycles (1400.2 MB/s)

copy_page by 'normal_copy_page' took 9178 cycles (680.9 MB/s)
copy_page by 'slow_copy_page' took 9011 cycles (693.6 MB/s)
copy_page by 'fast_copy_page' took 8138 cycles (768.0 MB/s)
copy_page by 'faster_copy' took 5508 cycles (1134.7 MB/s)
copy_page by 'even_faster' took 5552 cycles (1125.6 MB/s)

Regards,
        Dieter

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg Department of Computer Science @home: Dieter.Nuetzel at hamburg.de (replace at with @)


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:25 EST