a strange forking issue

Valentin Pavlov (vpavlov@moon.rila.bg)
Tue, 28 Sep 1999 13:45:01 +0300


Hi guys !

I've searched the archive for something like this, but without any success. So,
can someone explain to me why is this happening:

I have a program that takes as arguments a fork count (numforks) and a boolean
(dopause). Then it does a for() loop and spawns numforks children. Each
child makes a timestamping before and after a sleep(5) call and prints its
pid, starting time (relative to the start of the loop) and elapsed time for the
sleep call. Depending on the second argument (dopause) the parent does (or does
not) a little pause between the forks. Then, after the loop is over the parent
prints out its pid, starting time and elapsed time for the loop and then waits for all its children to exit. Now, here are the results for 50 children with and without pausing between forks (with some comments added from me):

----------------------------------------------------
vpavlov@moon:~/code > ./forker 50 1 << This is *with* pausing between forks

parent:
10374 2410019446 1.998980 pid start time elapsed time(sec)

children:
10375 0.009208 5.009925 pid start time(r) elapsed time(sec)
10376 0.059132 5.010000
10377 0.081607 5.007557
10378 0.129146 5.009987
10379 0.169179 5.009953
10380 0.209181 5.009950
10381 0.269211 5.009921
10382 0.309139 5.009992
10384 0.369255 5.009877 ! 10384 starts before 10383 !
10383 0.369179 5.015631 ! 10383 takes 'huge' time to execute
10385 0.419179 5.009953
10386 0.469179 5.009953
10387 0.489141 5.010016
10388 0.549132 5.009998
10389 0.569183 5.009976
10390 0.619179 5.009953
10391 0.669344 5.009788
10392 0.749135 5.009996
10393 0.769191 5.009968
10394 0.819184 5.009948
10395 0.849143 5.010000
10396 0.899132 5.009999
10397 0.919185 5.009976
10398 0.969181 5.009951
10399 1.019190 5.009942
10400 1.029145 5.009991
10401 1.069187 5.009945
10402 1.109729 5.009403
10403 1.169180 5.009954
10404 1.209176 5.009955
10405 1.219186 5.009949
10406 1.269186 5.009946
10407 1.319181 5.009951
10408 1.369182 5.009950
10409 1.389141 5.010016
10410 1.419191 5.009943
10411 1.469188 5.009944
10412 1.519187 5.009945
10413 1.569195 5.009937
10414 1.599131 5.010002
10415 1.619185 5.009974
10416 1.669185 5.009947
10417 1.709308 5.009891
10418 1.749142 5.010078
10419 1.769188 5.009993
10420 1.819183 5.009949
10421 1.859211 5.009988
10422 1.909132 5.010066
10423 1.929175 5.010005
10424 1.969184 5.009948

--------------------------------
As you can see, all times are reasonable - they have some small fluctuations
around 5.0099. Also, all children are ran 'in order' by their pid, exactly as
they are forked. Then, there is one exception 10384 is ran *before* 10383 and
10383 has somewhat 'enormous' execution time, not fitting in the fluctuations.

Now, here are the results *without* pausing between forks:

vpavlov@moon:~/code > ./forker 50 0
10323 2400448150 0.006484 <<parent (same as above)

10324 0.000445 5.009990 <<children (same as above)
10325 0.000537 5.013428
10328 0.001026 5.015934 !! where are 26 and 27 !!
10329 0.000951 5.019021
10330 0.000873 5.022255
10331 0.000796 5.025443
10327 0.000698 5.028672 !! here are 26 and 27 !!
10326 0.000616 5.032070
10332 0.012000 5.023868
10333 0.012121 5.029557
10334 0.012216 5.035209
10336 0.015735 5.037423 !! oops. 35 is missing !!
10337 0.015643 5.043211
10338 0.015552 5.049045
10339 0.015462 5.054892
10340 0.015378 5.060452
10341 0.015294 5.066061
10342 0.015202 5.071848
10343 0.015118 5.077597
10344 0.015026 5.083351
10345 0.014935 5.089076
10346 0.014846 5.094809
10347 0.014756 5.100576
10348 0.014662 5.106278
10349 0.014577 5.111970
10350 0.014483 5.117372
10351 0.014399 5.123052
10352 0.014313 5.128563
10353 0.014223 5.134279
10354 0.014135 5.140017
10355 0.014047 5.145714
10356 0.013959 5.151457
10357 0.013871 5.157151
10358 0.013782 5.162909
10359 0.013695 5.168592
10360 0.013603 5.174312
10361 0.013513 5.180150
10362 0.013425 5.185858
10363 0.013335 5.191577
10364 0.013246 5.197431
10365 0.013152 5.203159
10366 0.013063 5.208840
10367 0.012974 5.214534
10368 0.012886 5.220259
10369 0.012796 5.225643
10370 0.012702 5.231477
10371 0.012612 5.236844
10372 0.012521 5.242559
10373 0.012430 5.248250
10335 0.012306 5.253947 !! here is 35 !!
----------------------------------------------------------

As we can see, there are no more little fluctuations - execution time is
*CONSTANTLY GROWING* ?!? Also, if you change sleep(5) to sleep(20)
you'll see that more and more children are not executed in the order that
they are forked (like 26, 27 and 35) above. Also, if you timestamp the fork()
execution time, you'll see that it behaves the same way. Please pay attention
to the fact that the execution time of a child (about 5 secs.)
is enough, so all the children are running in the same instant in both cases.
Also, I've tried various variants for the waiting part, for example a
SIGCHLD handler, but it seems that this doesn't have any impact.

So, how and why this 'fast forking' affects the execution time of a child.
IMHO this is very strange and possibly there is something wrong with forking
and scheduling.

Here's the code:

--------- 8< -------- 8< -------- 8<---------------- forker.cc
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

main(int argc, char **argv)
{
struct timeval t0, t1, t2, t3; // time stamps
long s0, s1, s2, s3; // time stamps - microseconds

if(argc!=3) {
printf("usage: %s <numforks> <0|1>\n", argv[0]);
exit(1);
}
int numforks = atoi(argv[1]);
int dopause = atoi(argv[2]);
int pid;
int children = 0;

gettimeofday(&t0, NULL);
s0 = t0.tv_sec*1000000 + t0.tv_usec;
for(int i=0;i<numforks;i++) {
int pid = fork();
if(pid!=0) {
children++;
if(dopause) {
for(int j=0;j<4500000;j++)
;
}
}else {
gettimeofday(&t2, NULL);
sleep(5);
gettimeofday(&t3, NULL);
s2 = t2.tv_sec*1000000 + t2.tv_usec;
s3 = t3.tv_sec*1000000 + t3.tv_usec;
printf("%d\t%f\t%f\n", getpid(), (s2 - s0)/1000000.0, (s3-s2)/1000000.0);
exit(0);
}
}
gettimeofday(&t1, NULL);
s1 = t1.tv_sec*1000000 + t1.tv_usec;
printf("%d\t%lu\t%f\n\n", getpid(), s0, (s1-s0)/1000000.0);

while(children > 0) {
wait(NULL);
children--;
}
}
--------- 8< -------- 8< -------- 8<----------------

Best,

Valentin Pavlov [vpavlov@moon.rila.bg]

RILA Software, Inc.
'Acad G. Bonchev' Str., 27 bl.
Sofia 1113, Bulgaria
Phone: (+359 2) 971 31 00

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/