2.2.16 vm fixes

From: Andrea Arcangeli (andrea@suse.de)
Date: Thu Jun 15 2000 - 11:23:54 EST

Next message: Chris Evans: "Re: (reiserfs) Re: Red Hat (was Re: reiserfs)"
Previous message: Alan Cox: "Re: Keymapping vunerability (drivers/char/vt.c)"
In reply to: Marcelo Tosatti: "Re: Stability (2.2.14/15/16/17pre1)"
Next in thread: Rik van Riel: "Re: 2.2.16 vm fixes"
Reply: Rik van Riel: "Re: 2.2.16 vm fixes"
Reply: Marcelo Tosatti: "Re: 2.2.16 vm fixes"
Maybe reply: Wes Cowley: "Re: 2.2.16 vm fixes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 14 Jun 2000, Marcelo Tosatti wrote:

>mmap002 on 2.2.15 gets killed. mmap002 on 2.2.15 + 2.2.16's
>thrasing heuristic runs fine. Try it.

If you look at the vmstat while mmap002 gets killed you'll notice the
machine is really out of memory and the only complain you can make
is that there's still a relevant amount of _dirty_ cache in the
machine.

current 2.2.x vm (2.2.16 included) is not able to wait in any
way for the dirty buffers to get flushed to disk. _All_ the changes in
2.2.16 are unrelated to such problem and if they happen to not kill
mmap002 it it's just by luck or because the vm is become more aggressive
than it should be (and being more aggressive helps the case where we are
not able to write throttling correctly).

The free_before_allocate is necessary but it's not actually implemented
correctly. I implemented it as suggested in my email of yesterday to l-k
and all works just fine here as far I can tell. mtest -m 70 (on 128mbyte
machine) from SCT works fine as my other swap testcases. Rik, I guess your
machine imploded because you was increasing by mistake the
free_before_allocate also inside the atomic_read(&free_before_allocate).

I found that the changes in do_try_to_free_pages are buggy because one
task could make the cache freeable, the other task could free all the
cache that now is been unmapped by the first task. The first task that
made the cache freeable will be killed because when it tries to free the
cache it won't succeed (even if the other task just freed all the cache
and it made enough memory free for both processes). That happened here and
we have to consider swap_out a progress too if we don't want to kill
innocent task as could instead happen now. I have an idea on how to fix
this right also dropping the free_before_allocate stuff but it's too
intrusive to do it in 2.2.x and I believe we can live without problem with
free_before_allocate and considering swap_out a process.

Then kswapd shouldn't stop after a few passes but only if it fails to do
any progress during its work as in 2.2.15 (so when it make no sense
anymore to do the next pass very soon). Also running kswapd at once isn't
going to improve performance. What's the problem you had with kswapd in
2.2.x that caused to add such hacks?

Then assign = 0 shouldn't be done at each pass, but we should refresh the
swap_cnt stuff only at once per swap_out call or killing a task will take
too much time due too high complexity of the algorithm (you may not see
that complexity problem with 32mbyte of ram but you should see this with
some giga of ram with some more pagetable to scan).

I have a patch that fixes everything but write throttling (so mmap002 gets
still killed). But see the vmstat that I'm getting before mmap002 gets
killed:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 4 2 72252 1080 3944 53084 484 28 572 3541 564 187 1 83 16
0 3 2 72256 1124 3108 53920 352 20 324 1539 267 261 1 24 75
0 2 2 72256 1052 2720 54412 708 0 888 1959 411 167 0 78 21
1 2 2 72256 1104 2740 54352 416 0 711 1606 320 163 1 75 25
0 4 3 72256 1100 2780 54276 64 0 99 894 235 499 0 4 96
0 3 3 5668 64512 3992 54504 800 0 1198 2126 648 280 1 67 32

It's true that we still have of 50mbyte of dirty cache, but the swap is
full (I have 72256 kbyte of swap). So I see actually we are not shrinking
the cache well enough but the vm it's not obviously buggy and this problem
can't be fixed without writing proper throttling code similar to what we
have in 2.4.x in sync_page_buffers. Or better I bet you can fix it by just
changing the first value in /proc/sys/vm/bdflush from 40 to something like
10 but we really don't want to fix it that way since we want lots of
dirty cache for other scenarios where there's no cache pollution like in
the mmap002 case.

I'm implementing the write throttling thing now to see if then mmap002
survives than I'll post the whole patch with the do_try_to_free_pages and
free_before_allocated fixes.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chris Evans: "Re: (reiserfs) Re: Red Hat (was Re: reiserfs)"
Previous message: Alan Cox: "Re: Keymapping vunerability (drivers/char/vt.c)"
In reply to: Marcelo Tosatti: "Re: Stability (2.2.14/15/16/17pre1)"
Next in thread: Rik van Riel: "Re: 2.2.16 vm fixes"
Reply: Rik van Riel: "Re: 2.2.16 vm fixes"
Reply: Marcelo Tosatti: "Re: 2.2.16 vm fixes"
Maybe reply: Wes Cowley: "Re: 2.2.16 vm fixes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:35 EST