Re: Off Topic (MMX Overdrive performance)

Linus Torvalds (torvalds@transmeta.com)
23 Apr 1997 15:49:10 GMT


In article <Pine.LNX.3.95.970423072107.1678A-100000@mikeg.weiden.de>,
Michael L. Galbraith <mikeg@weiden.de> wrote:
>
>Anyone know what the heck Intel did to the MMX-Overdrive(150) to account
>for this?

Impressive. They seem to have improved memory read performance
noticeably, possibly due to the bigger cache (but just "bigger" doesn't
explain it: I suspect they made the cache bigger by increasing the
associativity which might certainly help).

> L M B E N C H 1 . 0 S U M M A R Y
> ------------------------------------
>
> *Local* Communication latencies in microseconds
> -----------------------------------------------
>Host OS Pipe UDP RPC/ TCP RPC/
> UDP TCP
>--------- ------------- ------- ------- ------- ------- -------
>mikeg.49 Linux 2.0.30 28 256 527 396 694
>MMX-150 Linux 2.0.30 25 154 357 243 486

For "pipe latency", the most important number tends to be context switch
speed. Did that change (you didn't include the process numbers)?

The other improvements are certainly quite impressive. I'd definitely
suspect the larger L1 cache.

> *Local* Communication bandwidths in megabytes/second
> ----------------------------------------------------
>Host OS Pipe TCP File Mmap Bcopy Bcopy Mem Mem
> reread reread (libc) (hand) read write
>--------- ------------- ---- ---- ------ ------ ------ ------ ---- -----
>mikeg.49 Linux 2.0.30 27 12 25 55 26 24 71 37
>MMX-150 Linux 2.0.30 37 16 40 82 26 26 94 37

The above just indicates you have a _lot_ better memory bandwidth for
some reason (but only for reading). That can't be explained by just a
larger cache, as lmbench certainly isn't stupid enough not to know about
caches. Intel must have improved their cache miss penalty too.

>bogomips : 299.83 <== eek! bogus but pretty

Very pretty. They are doing the whole loop of two instructions
consistently in one clock cycle, which is pretty impressive. They
obviously fixed the misfeature with the original Pentium branch
predictor that made it give bad values for the bogomips loop.

Linus