Re: Egg on my face -- disk speed

Harald Koenig (koenig@tat.physik.uni-tuebingen.de)
Wed, 4 Nov 1998 11:13:47 +0100


--6c2NcOVqGQ03X4Wi
Content-Type: text/plain; charset=us-ascii

On Nov 03, Alan Cox wrote:

> > The numbers for the test in case are currently about 34% cpu for the
> > disk to /dev/null and about 64% for disk to disk.
>
> You still havent said how you are measuring the CPU utilisation.

suggestion: please use the attached programs lops[123].c to measure
how much CPU/system power is left. those three versions use different
loop methods and so cause different additional load on the system.
lops3.c is mostly cpu/register load, while the other too add
cache/memory load too.

to get an idea you have to watch/compare result of all three programs
_and_ compare the influence about your `real' operation
(here: disk transfer rate).!

those programs print a report about once a second showing

1) time in seconds
2) MegaLoops run so far (Mega == 10^6 here!)
3) MegaLoops per seconds, average over whole run
3) MegaLoops per seconds for the last second.

usually, only the last number is interesting to watch and shows
the dynamics of some sort of `cpu bound system load'.

this is a sample output for m PODP83:

# lops1
0.997479 9.696 9.720088 9.720088
1.997621 14.922 7.469857 5.225618
2.998722 24.627 8.212566 9.694584
3.997552 34.378 8.599877 9.762675
5.001929 44.176 8.831864 9.755201
6.023015 54.093 8.981062 9.711928
6.997616 63.717 9.105523 9.874689
7.997728 73.389 9.176198 9.670704
8.998768 83.122 9.237010 9.722863
9.997672 92.812 9.283338 9.700690
10.997700 102.502 9.320280 9.689605

# lops2
0.992931 24.783 24.959134 24.959134
1.992957 49.627 24.901111 24.843500
2.994216 75.477 25.207449 25.817198
3.992979 100.466 25.160677 25.020458
4.997119 125.580 25.130497 25.010489
5.993019 150.558 25.122202 25.080579
6.993060 175.464 25.091172 24.905214
7.993096 188.870 23.629148 13.405492
8.994362 213.859 23.776974 24.957066
9.993165 238.723 23.888657 24.894381

# lops3
1.014456 38 37.458500 37.458500
2.007864 69 34.364877 31.205708
2.994473 103 34.396704 34.461474
4.078280 140 34.328197 34.138920
5.004352 161 32.171997 22.676423
6.014452 200 33.253237 38.610039
7.000730 237 33.853612 37.514778
8.008224 274 34.214827 36.724784
9.008573 311 34.522671 36.987092
10.011252 348 34.760887 36.901142

now, this was for almost idle system (but X11, xload,
xclock -update 1, xosview, etc. running).

now here is an example where the first 10 seconds are again
`almost idle', then at 10 secs I've started disk transfer using

# (sleep 10 ; dd if=/dev/sda bs=32k of=/dev/null count=5000 ) &
# lops3

1.012608 38 37.526861 37.526861
1.996960 57 28.543386 19.302038
3.004841 93 30.950057 35.718502
4.004192 127 31.716761 34.022080
5.002343 165 32.984543 38.070392
6.002489 202 33.652706 36.994599
7.004903 239 34.118959 36.910897
8.004594 276 34.480200 37.011437
9.009096 313 34.742665 36.834173
10.007982 350 34.972085 37.041264

11.025245 379 34.375653 28.507869
12.026308 395 32.844660 15.983010
13.010577 408 31.359101 13.207771
14.030449 420 29.934894 11.766182
15.072918 434 28.793363 13.429656
16.044392 451 28.109510 17.499182
17.015571 468 27.504220 17.504497
18.009719 485 26.929904 17.100070
19.028765 500 26.276009 14.719650
20.067456 518 25.812938 17.329504
21.009815 534 25.416692 16.978667
22.022972 552 25.064737 17.766249
23.016176 570 24.765191 18.123165
24.054317 588 24.444677 17.338685
25.016776 604 24.143799 16.624085
26.096194 614 23.528335 9.264252
27.139062 626 23.066383 11.506730
28.014653 641 22.880883 17.131286
29.055275 657 22.612073 15.375420
30.039037 676 22.504050 19.313614

so for my PODP83 with NCR810 and Quantum Atlas1 4GB,
I'd say this test gives about

( 1 - 17.5 / 37.5 ) * 100% == 53.5%
^^^^ ^^^^
max for load / idle

system load for raw disk transfer.

reallity check! testing the influence of lops3
to sustained transfer rate shows no impact:

numbers are seconds, overall bytes transfered in K, overall bytes/sec
and bytes/sec for the last 10M transfered (report quantum very 10M).

the first 5 lines (50M transfered) are for idle system, then lops3 was started.
peak transfer rate with lops3 running is still 3.6M/sec, distortions are
due to other processes.

2.9869 10272K 3521604 3510633
5.7740 20512K 3637720 3762155
8.8082 30752K 3575090 3455906
11.6163 40992K 3613518 3734053
14.6907 51232K 3571079 3410724

18.7231 61472K 3362020 2600384
22.4261 71712K 3274447 2831663
25.4004 81952K 3303841 3525478
28.4718 92192K 3315722 3413976
31.3613 102432K 3344581 3628939
35.2236 112672K 3275534 2714890
38.0980 122912K 3303633 3647958
41.0048 133152K 3325160 3607310
43.9104 143392K 3343930 3608811
47.5749 153632K 3306765 2861429

Harald

PS: I'd be happy to get some comments and discussion about such methods
to measure the `remaining system load' in general and about those
small programs. to me this is a good tool to get an idea (and some number)
about system impact of some operations (also e.g. fast network transfer
or IDE disks without DMA, etc.)

PPS: sorry for the long mail :-(

--
All SCSI disks will from now on                     ___       _____
be required to send an email notice                0--,|    /OOOOOOO\
24 hours prior to complete hardware failure!      <_/  /  /OOOOOOOOOOO\
                                                    \  \/OOOOOOOOOOOOOOO\
                                                      \ OOOOOOOOOOOOOOOOO|//
Harald Koenig,                                         \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik                              //  /     \\  \
koenig@tat.physik.uni-tuebingen.de                     ^^^^^       ^^^^^

--6c2NcOVqGQ03X4Wi Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="lops1.c"

/* * lops1.c -- loop and calculate/output LoopOPs as peroformace measure * Copyright Harald Koenig <koenig@tat.physik.uni-tuebingen.de> 1996 * * Permission is hereby granted to copy, modify and redistribute this code * in terms of the GNU Library General Public License, Version 2 or later, * at your option. * * to compile use cc -O2 -Wall -o lops1 lops1.c */

#include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/time.h>

struct timeval t0,t1,t2; unsigned int count0=0, last_count0=0; unsigned int count1=0, last_count1=0;

void alarm_handler(int i) { double c, d1, d0; static double lc=0; signal (SIGALRM, alarm_handler);

gettimeofday(&t2, NULL); c = count1*65536.0*65536.0+count0; d1 = t2.tv_sec-t1.tv_sec+(t2.tv_usec-t1.tv_usec)*1e-6; d0 = t2.tv_sec-t0.tv_sec+(t2.tv_usec-t0.tv_usec)*1e-6;

printf("%10.6f %12.3f %11.6f %11.6f\n" ,d0 ,c*1e-6 ,(c*1e-6)/d0 ,((c-lc)*1e-6)/d1 ); t1 = t2; last_count0 = count0; last_count1 = count1; lc = c; }

void main(int argc, char *argv[]) { int t=1; struct itimerval itimer; if (argc == 2) t = atoi(argv[1]); else if (argc > 2) { printf("usage: %s [ time_in_secs ]\n",argv[0]); exit(1); }

itimer.it_value.tv_sec = t; itimer.it_value.tv_usec = 0; itimer.it_interval = itimer.it_value;

signal (SIGALRM, alarm_handler); gettimeofday(&t0, NULL); t1 = t0; setitimer(ITIMER_REAL, &itimer, NULL); for(;;count1++) for (;++count0;) ;

exit(-1); }

--6c2NcOVqGQ03X4Wi Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="lops2.c"

/* * lops2.c -- loop and calculate/output LoopOPs as peroformace measure * Copyright Harald Koenig <koenig@tat.physik.uni-tuebingen.de> 1996 * * Permission is hereby granted to copy, modify and redistribute this code * in terms of the GNU Library General Public License, Version 2 or later, * at your option. * * to compile use cc -O2 -Wall -s -o lops2 lops2.c */

#include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/time.h>

/* you have to test which data type is fastest ... */ /* for both ALPHA 21064 and Intel Pentium OverDrive "int" is optimal */

volatile union { char c; short s; int i; long l; } not_timeout={1}; #define NOT_TIMEOUT not_timeout.i

void alarm_handler(int i) { signal (SIGALRM, alarm_handler); NOT_TIMEOUT=0; }

void main(int argc, char *argv[]) { register unsigned int count0=0; unsigned int count1=0; int t=1; struct itimerval itimer; struct timeval t0,t1,t2; double c, d1, d0; static double lc=0; if (argc == 2) t = atoi(argv[1]); else if (argc > 2) { printf("usage: %s [ time_in_secs ]\n",argv[0]); exit(1); }

itimer.it_value.tv_sec = t; itimer.it_value.tv_usec = 0; itimer.it_interval = itimer.it_value;

signal (SIGALRM, alarm_handler); gettimeofday(&t0, NULL); t1 = t0; setitimer(ITIMER_REAL, &itimer, NULL); while (1) { while (NOT_TIMEOUT) { #ifndef i386 /* i386-gcc doesn't optimize this well */ count0++; if (!count0) count1++; } #else /* for i386 a hack is needed to get count0 into a register :-( */ for (;++count0;) if (!NOT_TIMEOUT) goto l; count1++; } l: #endif gettimeofday(&t2, NULL); NOT_TIMEOUT = 1; c = count1*65536.0*65536.0+count0; d1 = t2.tv_sec-t1.tv_sec+(t2.tv_usec-t1.tv_usec)*1e-6; d0 = t2.tv_sec-t0.tv_sec+(t2.tv_usec-t0.tv_usec)*1e-6;

printf("%10.6f %12.3f %11.6f %11.6f\n" ,d0 ,c*1e-6 ,(c*1e-6)/d0 ,((c-lc)*1e-6)/d1 ); t1 = t2; lc = c; } exit(-1); }

--6c2NcOVqGQ03X4Wi Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="lops3.c"

/* * lops3.c -- loop and calculate/output LoopOPs as peroformace measure * Copyright Harald Koenig <koenig@tat.physik.uni-tuebingen.de> 1996 * * Permission is hereby granted to copy, modify and redistribute this code * in terms of the GNU Library General Public License, Version 2 or later, * at your option. * * to compile use cc -O2 -Wall -s -o lops3 lops3.c */

#include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/time.h>

volatile int not_timeout=1;

void alarm_handler(int i) { signal (SIGALRM, alarm_handler); not_timeout=0; }

void main(int argc, char *argv[]) { unsigned int count=0, last_count=0; int t=1; struct itimerval itimer; struct timeval t0,t1,t2; double d1, d0; if (argc == 2) t = atoi(argv[1]); else if (argc > 2) { printf("usage: %s [ time_in_secs ]\n",argv[0]); exit(1); }

itimer.it_value.tv_sec = t; itimer.it_value.tv_usec = 0; itimer.it_interval = itimer.it_value;

signal (SIGALRM, alarm_handler); gettimeofday(&t0, NULL); t1 = t0; setitimer(ITIMER_REAL, &itimer, NULL); while (1) { while (not_timeout) { { register int i; #ifdef __alpha__ asm("nop"); #endif for (i=1000000; --i; ) #ifdef __i386__ asm("nop") /* gives faster `bogus' loops for Pentium */ #endif ; } count++; }

gettimeofday(&t2, NULL); not_timeout = 1; d1 = t2.tv_sec-t1.tv_sec+(t2.tv_usec-t1.tv_usec)*1e-6; d0 = t2.tv_sec-t0.tv_sec+(t2.tv_usec-t0.tv_usec)*1e-6;

printf("%10.6f %10u %11.6f %11.6f\n" ,d0 ,count ,count/d0 ,(count-last_count)/d1 ); t1 = t2; last_count = count; } exit(-1); }

--6c2NcOVqGQ03X4Wi--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/