The SMP works well on P6DOF, giving better performance than 2.0.26 SMP,
except for process creation and conncurrent shell scripts/
In both 2.0 and 2.1 series, SMP is about 70% of single CPU performance
overall, with the major difference in 8 concurrent shell script benchmark,
where SMP truly shines with 140% of single cpu performance.
Summary of Unixbench 4.0 on 2.0.26 and 2.1.14 Megapatch#6
Notes:
baseline index is 10.0 on SS20-60.
Higher numbers are faster.
Unixbench 4.0 is available at
ftp://linux.wauug.org/ftp/pub/bench/unixbench-4.0-DELTA.tgz
rh-128M-2CPU-LX2.1.14.961209
TEST BASELINE RESULT INDEX
Arithmetic Test (type = double) 29820.0 52816.5 17.7
Dhrystone 2 using register variables 116700.0 320194.7 27.4
Execl Throughput 43.0 228.9 53.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 26404.0 66.7
File Copy 256 bufsize 500 maxblocks 1655.0 12077.0 73.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 30985.0 53.4
Pipe Throughput 12440.0 78503.4 63.1
Pipe-based Context Switching 4000.0 16497.7 41.2
Process Creation 126.0 906.2 71.9
Shell Scripts (8 concurrent) 6.0 83.0 138.3
System Call Overhead 15000.0 51661.8 34.4
=========
FINAL SCORE 50.9
rh-128M-2CPU-LX2.0.26.961126
TEST BASELINE RESULT INDEX
Arithmetic Test (type = double) 29820.0 52828.0 17.7
Dhrystone 2 using register variables 116700.0 321748.4 27.6
Execl Throughput 43.0 236.8 55.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 23807.0 60.1
File Copy 256 bufsize 500 maxblocks 1655.0 10030.0 60.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 30463.0 52.5
Pipe Throughput 12440.0 57406.8 46.1
Pipe-based Context Switching 4000.0 13050.2 32.6
Process Creation 126.0 1107.8 87.9
Shell Scripts (8 concurrent) 6.0 86.3 143.8
System Call Overhead 15000.0 36945.6 24.6
=========
FINAL SCORE 46.9
For comparison, single processor configuration on same hardware
rh3-128M-1CPU-LX2.1.10.961115
TEST BASELINE RESULT INDEX
Arithmetic Test (type = double) 29820.0 52429.5 17.6
Dhrystone 2 using register variables 116700.0 319332.9 27.4
Execl Throughput 43.0 260.1 60.5
File Copy 1024 bufsize 2000 maxblocks 3960.0 31529.0 79.6
File Copy 256 bufsize 500 maxblocks 1655.0 17348.0 104.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 34234.0 59.0
Pipe Throughput 12440.0 114065.9 91.7
Pipe-based Context Switching 4000.0 50020.3 125.1
Process Creation 126.0 2579.7 204.7
Shell Scripts (8 concurrent) 6.0 59.0 98.3
System Call Overhead 15000.0 96996.5 64.7
=========
FINAL SCORE 70.5
rh-128M-1CPU-LX2.0.24.961101
TEST BASELINE RESULT INDEX
Arithmetic Test (type = double) 29820.0 52757.8 17.7
Dhrystone 2 using register variables 116700.0 321394.1 27.5
Execl Throughput 43.0 259.0 60.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 28903.0 73.0
File Copy 256 bufsize 500 maxblocks 1655.0 14033.0 84.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 33552.0 57.8
Pipe Throughput 12440.0 81685.9 65.7
Pipe-based Context Switching 4000.0 37619.2 94.0
Process Creation 126.0 2500.8 198.5
Shell Scripts (8 concurrent) 6.0 58.7 97.8
System Call Overhead 15000.0 62173.5 41.4
=========
FINAL SCORE 62.0
While there is clearly room in the SMP architecture to improve
performance, as the single cpu benchmarks leave lots of headroom, I am
still genuinely awestruck at the increases in 2.1. Where does Linus find
the cycles?
As another point of reference, the Ultra on my desk, gcc 2.7.2,
SunOS rosie 5.5.1 Generic sun4u sparc SUNW,Ultra-1 clock 143 MHz
INDEX VALUES
TEST BASELINE RESULT INDEX
Arithmetic Test (type = double) 29820.0 34514.4 11.6
Dhrystone 2 using register variables 116700.0 258164.4 22.1
Execl Throughput 43.0 75.3 17.5
File Copy 1024 bufsize 2000 maxblocks 3960.0 1317.0 3.3
File Copy 256 bufsize 500 maxblocks 1655.0 2143.0 12.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 1260.0 2.2
Pipe Throughput 12440.0 37144.8 29.9
Pipe-based Context Switching 4000.0 11952.6 29.9
Process Creation 126.0 331.2 26.3
Shell Scripts (8 concurrent) 6.0 16.0 26.7
System Call Overhead 15000.0 41970.2 28.0
=========
FINAL SCORE 14.8
Sam Chessman (SSC3) chessman@wauug.erols.com