a.out binaries that are 66% faster than ELF

Paul Gortmaker (paul@rasty.anu.edu.au)
Wed, 26 Feb 1997 13:18:32 +1000 (EST)

As some of you are probably aware, there are client programs
that can be used in a cooperative effort to work on the RC5
challenge. These are 100% CPU intensive, and I've found that
an a.out binary compiled from the exact same source, same opt.
flags, etc. is nearly 50% faster than the identical ELF version!
I even tried linking the ELF binary static, but that didn't help.

Here is a sample from an amd486dx-160:
ratbag:~> rc5 -m
rc5-56-client: Performance testing with 1000000 crypts
rc5-56-client: Complete in 12.128 seconds. (82453.90 keys/sec)
ratbag:~> rc5-aout -m
rc5-56-client: Performance testing with 1000000 crypts
rc5-56-client: Complete in 8.290 seconds. (120623.65 keys/sec)

(120.6-82.5)/82.5 = 46% faster.

These numbers are reproducible, and the aout binary is also faster
than ELF on the following machines that I was able to test them on:

i486dx-100 (75300 vs 64900 = 16%)
i486dx-66 (50200 vs 41200 = 22%)
i486dx-33 (25000 vs 21300 = 17%)
i386-40, 128kB cache at 2-1-1-1 (15900 vs 9550 = 66%) <-- Wow!

I do not know if the same holds true for Pentium, PPro, or Cyrix CPUs.
Nor can I explain why the amd486-160 and the i386 seem to be harder
hit by it than the other 486'es.

The program is small enough that I expect the core calculation to fit
within an 8k internal cache, so L2 cache or DRAM performance should
not effect the performance on 486 or better, and hence the same model
CPUs should give very similar results regardless of machine configuration.

Info about the program and the source (only 7kB) can be obtained from


and if you are outside the US (ITAR nonsense) you can get it from:


and possibly also at:


I'd say that the performance difference between the ELF and a.out versions
warrants an investigation of some sort. Comparing the "gcc -S" would be
a good start I guess...