> I do not know if the same holds true for Pentium, PPro, or Cyrix CPUs.
> Nor can I explain why the amd486-160 and the i386 seem to be harder
> hit by it than the other 486'es.
The 160 is presumably an enhanced (16k wb cache) model, unlike mine
(which was one of the last DX2-80-badged chips to come off the lines
with a working clock-triple pin). The 386 has only the cache on the
motherboard, which I'd expect is direct-mapped.
> The program is small enough that I expect the core calculation to fit
> within an 8k internal cache, so L2 cache or DRAM performance should
> not effect the performance on 486 or better, and hence the same model
> CPUs should give very similar results regardless of machine configuration.
The way the 386, with its clearly adequate-sized cache, takes such a
hit implies that we're getting scads and scads of conflict misses
from the elf executable that we're not getting with the a.out executable.
> I'd say that the performance difference between the ELF and a.out versions
> warrants an investigation of some sort. Comparing the "gcc -S" would be
> a good start I guess...
Agreed; I don't have the a.out crt0.o any more, but I can still look at
the assembly when I have time... My expectation, given the above, is
that the .s files will look nearly identical, though...
Keith