>I tested 2.0.36, 2.2.5 & 2.2.5arca12 with the program appended to the
>end of this message. The program builds a large gdbm file. It's a
In your program you are killing the caching behaviour that my buffer code
does on dirty buffers (the thing that doesn't work on the stock kernel due
flushtime mess). The fact is that you are doing many gdbm_sync in the
middle of your proggy.
I'll show you the results of your nice testcase on my double-PII with only
one IDE disk that goes at:
root@laser:/usr/src/linux# hdparm -t /dev/hda
/dev/hda:
Timing buffered disk reads: 32 MB in 4.71 seconds = 6.79 MB/sec
Here it is a run of your proggy (gdbm_sync included) with 120000 lines of
input:
root@laser:/home/andrea/devel/dbase# readprofile -r
root@laser:/home/andrea/devel/dbase# exit
andrea@laser:~/devel/dbase$ ./generate_input.awk -v nr=120000 > input.txt
andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input.txt >/dev/null
25.65user 8.71system 2:17.55elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1116major+1425minor)pagefaults 0swaps
andrea@laser:~/devel/dbase$ readprofile -m /System.map | sort -r | head
918 total 0.0016
247 __strncpy_from_user 6.8611
136 find_buffer 1.7895
^^^^^^^^^^^ see how much your app is rb-tree intensive (or
hashtable intensive ;)
53 getblk 0.0683
51 update_vm_cache 0.2125
47 get_hash_table 2.3500
46 ext2_file_write 0.0372
34 run_bottom_halves 0.6538
27 ext2_getblk 0.0515
22 make_request 0.0177
You see that the time is quite I/O dominated though...
If I comment out the gdbm_sync in your dbase app in this way:
--- dbase.c Sat Apr 24 16:27:08 1999
+++ dbase.c.nosync Sat Apr 24 16:27:10 1999
@@ -187,8 +187,10 @@
report_time(&group_start_time, &end_time);
group_start_time = end_time;
}
+#if 0
if (linenum % 10000 == 0)
gdbm_sync(dbf);
+#endif
if (err != 3) {
fprintf(stderr, "Read error at line %ld. groupnum = %d subgroupnum = %d name = %s.\n",
linenum, groupnum, subgroupnum, name);
then I get:
andrea@laser:~/devel/dbase$ rm dbase.[co]
andrea@laser:~/devel/dbase$ ln -s
dbase.c.nosync dbase.c
andrea@laser:~/devel/dbase$ make
cc -g -O2 -c dbase.c -o dbase.o
cc dbase.o -lgdbm -o dbase
andrea@laser:~/devel/dbase$ rm dbase.gdbm
andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input.txt >/dev/null
23.49user 7.48system 0:34.19elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (679major+1425minor)pagefaults 0swaps
34 sec to process 120000 lines against the previous 2:17 minutes with the
gdbm_sync (well doing a gdbm_sync is still desiderable but better to do
it at the end of the computation ;).
> user sys elapsed worst 2nd user sys elapsed worst 2nd comments
>2.0.36 27.69 7.70 2:37.27 9.73 6.88 28.97 10.21 17:30.68 76.37 75.11 poor
> 28.32 7.33 0:43.87 3.36 2.32
> 27.52 7.40 0:43.08 3.42 2.22
>
>2.2.5 28.13 6.80 2:50.52 8.00 7.19 29.44 8.70 13:21.34 61.29 64.95 fine
> 28.64 6.61 1:47.39 7.91 6.77
> 28.55 6.40 1:48.49 7.88 7.11
>
>2.2.5arca12 27.99 6.77 1:24.68 14.93 9.38 30.13 6.91 4:13.60 14.08 13.09 very poor
> 28.98 6.69 1:19.67 14.12 9.56
> 28.18 6.77 1:18.99 13.89 9.39
Here I'll run the dbase app with 50000 lines of input as you did to get
the numbers above:
andrea@laser:~/devel/dbase$ ./generate_input.awk -v nr=50000 > input-50.txt
andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input-50.txt >/dev/null
6.58user 1.58system 0:09.11elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (478major+1392minor)pagefaults 0swaps
Here it takes 9 sec compared to the 2:50 of 2.2.5 and 1:24 of
2.2.5_arca12 ;). But I also changed your app and not only the kenrel (and
you must also consider that it's running on a PII 450 and not on a P5).
So now I'll reput the gdbm_sync back:
andrea@laser:~/devel/dbase$ rm dbase.[co]
andrea@laser:~/devel/dbase$ ln -s dbase.c.orig dbase.c
andrea@laser:~/devel/dbase$ make
cc -g -O2 -c dbase.c -o dbase.o
cc dbase.o -lgdbm -o dbase
andrea@laser:~/devel/dbase$ rm dbase.gdbm
andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input-50.txt >/dev/null
6.15user 1.85system 0:16.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (107major+1392minor)pagefaults 0swaps
With the same dbase binary it now takes 16 sec compared to the 2:50
minutes of 2.2.5 (well really I never tried with a clean 2.2.5 here
because I want to stress my code all the time...). Anyway it looks like
good enough to me ;). Since the CPU you was using was slower you probably
will get around 20/25 sec or a bit more, but sure not 2 minutes and 50 sec
of 2.2.5.
Iteractive performance was perfect here (well except during the heavy I/O
forced by the gdbm_sync, during that time you have tons of I/O request
pending and if one of your app does a fsync too you get your other app
blocked too because it has to wait that the I/O subsystem will accept its
I/O request (and the I/O subsystem as just said was saturated by the
gdbm_sync)). The only way to get optimal performances is to avoid the
fsync in the middle of the computation and leave the kernel to flush
buffers in a finegrined manner without saturating the I/O subsystem for
long time (this doesn't apply to a gdbm_sync at the end of the
computation, that is still a good idea).
I thank you very much for your really nice testcase that allowed me to
optimize the kernel for database activities (I bet all db (also DBMS
servers) will get a _big_ boost with my new code).
Ah and I am using rb-trees in both page cache and buffer cache (no
hashtable anymore), and your proggy (as every db software I think) stress
very _much_ find_buffer looking the profiling results I showed you above.
All tests above are been run here over 2.2.6_andrea2.bz2, so I would be
glad if you would try my new code and feedback on your machines too ;).
ftp://e-mind.com/pub/andrea/kernel/2.2.6_andrea2.bz2
Thanks.
Andrea Arcangeli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/