Re: a.out binaries that are 66% faster than ELF

Keith Rohrer (kwrohrer@uiuc.edu)
Wed, 26 Feb 1997 01:31:40 -0600 (CST)


> As some of you are probably aware, there are client programs
> that can be used in a cooperative effort to work on the RC5
> challenge. These are 100% CPU intensive, and I've found that
> an a.out binary compiled from the exact same source, same opt.
> flags, etc. is nearly 50% faster than the identical ELF version!
> I even tried linking the ELF binary static, but that didn't help.
>
> Here is a sample from an amd486dx-160:
[snip]
> (120.6-82.5)/82.5 = 46% faster.
>
> These numbers are reproducible, and the aout binary is also faster
> than ELF on the following machines that I was able to test them on:
They may be "reproducible", but I get a variance anywhere from 55-60
kilokeys/sec.

> i486dx-100 (75300 vs 64900 = 16%)
My AMD 486DX4-100 (8k write-through L1 cache) doesn't even give 60,000
keys/sec (though admittedly it's not completely idle otherwise). I
think this might point to bad L1 cache usage...

> I do not know if the same holds true for Pentium, PPro, or Cyrix CPUs.
> Nor can I explain why the amd486-160 and the i386 seem to be harder
> hit by it than the other 486'es.
The 160 is presumably an enhanced (16k wb cache) model, unlike mine
(which was one of the last DX2-80-badged chips to come off the lines
with a working clock-triple pin). The 386 has only the cache on the
motherboard, which I'd expect is direct-mapped.

> The program is small enough that I expect the core calculation to fit
> within an 8k internal cache, so L2 cache or DRAM performance should
> not effect the performance on 486 or better, and hence the same model
> CPUs should give very similar results regardless of machine configuration.
The way the 386, with its clearly adequate-sized cache, takes such a
hit implies that we're getting scads and scads of conflict misses
from the elf executable that we're not getting with the a.out executable.

> I'd say that the performance difference between the ELF and a.out versions
> warrants an investigation of some sort. Comparing the "gcc -S" would be
> a good start I guess...
Agreed; I don't have the a.out crt0.o any more, but I can still look at
the assembly when I have time... My expectation, given the above, is
that the .s files will look nearly identical, though...

Keith