Re: [big performances boost for DataBases] Re: cache killer memory death test - 2.0 vs 2.2 vs arca

Harvey J. Stein (hjstein@bfr.co.il)
25 Apr 1999 22:46:47 +0300


Andrea Arcangeli <andrea@e-mind.com> writes:

<snip>

> Here it is a run of your proggy (gdbm_sync included) with 120000 lines of
> input:
>
> root@laser:/home/andrea/devel/dbase# readprofile -r
> root@laser:/home/andrea/devel/dbase# exit
> andrea@laser:~/devel/dbase$ ./generate_input.awk -v nr=120000 > input.txt
> andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input.txt >/dev/null
> 25.65user 8.71system 2:17.55elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (1116major+1425minor)pagefaults 0swaps

BTW, as another data point, on a 533mhz alpha with 2.0.36 kernel,
128mb of ram & a fast wide SCSI disk, it takes ~2-3 minutes.

<snip>

> If I comment out the gdbm_sync in your dbase app in this way:

<snip>

> then I get:

<snip>

> andrea@laser:~/devel/dbase$ /usr/bin/time ./dbase <input.txt >/dev/null
> 23.49user 7.48system 0:34.19elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (679major+1425minor)pagefaults 0swaps
>
> 34 sec to process 120000 lines against the previous 2:17 minutes with the
> gdbm_sync (well doing a gdbm_sync is still desiderable but better to do
> it at the end of the computation ;).

Interesting. I'm surprised it makes such a difference. First of all,
it's not that many calls to gdbm_sync - one every 10,000 records = 5
calls on the 50,000 record test I was doing, 12 calls in your test.
Secondly, I think I originally put in the gdbm_sync calls in an effort
to force things to disk so that the kernel *would* be able to discard
pages and not get into a memory bind. I'm surprised taking it out
improves things. But, this was a long time ago and what you say below
about fsync saturating the I/O subsystem for a long time rings a bell.
Now that I think about it, I think I put in the gdbm_sync calls
because the system was freezing up for a long time when the program
was running. Maybe it was freezing up in gdbm_close, which does an
fsync, so the repeated gdbm_sync calls were probably causing lots of
little freezes instead of one big freeze.

<snip>

> Here it takes 9 sec compared to the 2:50 of 2.2.5 and 1:24 of
> 2.2.5_arca12 ;). But I also changed your app and not only the kenrel (and
> you must also consider that it's running on a PII 450 and not on a P5).

Not to mention that you also have 128mb of memory & I'm running tests
on a 32mb machine.

<snip>

> Iteractive performance was perfect here (well except during the heavy I/O
> forced by the gdbm_sync, during that time you have tons of I/O request
> pending and if one of your app does a fsync too you get your other app
> blocked too because it has to wait that the I/O subsystem will accept its
> I/O request (and the I/O subsystem as just said was saturated by the
> gdbm_sync)). The only way to get optimal performances is to avoid the
> fsync in the middle of the computation and leave the kernel to flush
> buffers in a finegrined manner without saturating the I/O subsystem for
> long time (this doesn't apply to a gdbm_sync at the end of the
> computation, that is still a good idea).

Given that I'm just building the file & exiting, if the kernel behaves
well, then I don't see any reason then to do any gdbm_sync calls at
all. Closing the dbase does an fsync & forces everything to disk
anyway. Why do you think the additional gdbm_sync calls are a good
idea?

> I thank you very much for your really nice testcase that allowed me to
> optimize the kernel for database activities (I bet all db (also DBMS
> servers) will get a _big_ boost with my new code).

No problem! You don't know how happy I am that this is getting looked
into.

But, it's not clear to me that DBMS servers will behave similarly -
gdbm uses hash tables.

> Ah and I am using rb-trees in both page cache and buffer cache (no
> hashtable anymore), and your proggy (as every db software I think) stress
> very _much_ find_buffer looking the profiling results I showed you above.
>
> All tests above are been run here over 2.2.6_andrea2.bz2, so I would be
> glad if you would try my new code and feedback on your machines too ;).
>
> ftp://e-mind.com/pub/andrea/kernel/2.2.6_andrea2.bz2

Downloading now. Dying to try it out. Why is it _andrea again
instead of _arca? And what exactly is the difference btw 2.2.6_arca1
& 2.2.6_andrea2? I already tested with 2.2.6_arca1 (vs 2.2.6). Here
are the full results:

Test 1 Test 2
user sys elapsed worst 2nd user sys elapsed worst 2nd comments
2.0.36 27.69 7.70 2:37.27 9.73 6.88 28.97 10.21 17:30.68 76.37 75.11 poor
28.32 7.33 0:43.87 3.36 2.32
27.52 7.40 0:43.08 3.42 2.22

2.2.5 28.13 6.80 2:50.52 8.00 7.19 29.44 8.70 13:21.34 61.29 64.95 fine
28.64 6.61 1:47.39 7.91 6.77
28.55 6.40 1:48.49 7.88 7.11

2.2.5arca12 27.99 6.77 1:24.68 14.93 9.38 30.13 6.91 4:13.60 14.08 13.09 very poor
28.98 6.69 1:19.67 14.12 9.56
28.18 6.77 1:18.99 13.89 9.39

2.2.6 28.20 6.70 2:47.06 7.47 6.87 31.73 8.92 14:41.84 16.57 16.53 fine
28.92 6.17 1:46.01 7.97 6.62
28.69 6.39 1:47.77 7.81 6.76

2.2.6arca1 28.66 7.73 1:24.43 13.85 9.06 30.07 9.62 10:45.50 65.61 45.63 fine
29.15 7.82 1:15.55 13.38 8.74
28.37 7.85 1:12.91 13.33 7.24

arca1 got the data to the disk faster than 2.2.6, but not a hell of
alot faster, and interactive performance was maybe a little worse. It
was close enough that I wanted to run the test again while being more
careful of exactly what I run interactively to see whether it was the
kernels or just the job mix. Also not that 2.2.6 is much less spiky
- the worst write time is much lower than 2.2.6arca1.

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/