Not to belittle anyone's efforts to get more efficient use of the system
and stave off the big implosion, but the "kill something, glean some stale
pages that we don't really need but that are still marked as in use"
seems kind of stop-gap to me. It has this vagueness and ambiguity to it.
Maybe that's just because I'm not reading the vm code while I think about
it, but "what to kill" seems to me insoluble. You can take rate of new vma
allocation per process for a heuristic, but that's only as good as the
restraint shown by actual application code in using memory. If someone
runs a big arbitrary graph-structured-dynamic-systems simulator that
really needs a ton of memory, you're going to end up killing that for no
reason other than that it uses a lot of memory.
This seems to me a sort of poking at it with a stick blindfolded solution.
What about a dynamic swap file with no size limit other than partition
space wherever /var/tmp or /tmp live? All you do with it is suspend things
into it. Set a limit where it kicks in (90% of ram + fixed swap in use,
whatever), and start suspending user space processes until the usage stays
under that level for some timer count. Count from higher pids to lower (ie
init is the last thing to go, and would never actually get swapped out
because it doesn't use enough memory to even get close), or count down by
process creation time in /proc, or what have you. If a process can be
swapped and suspended by the kernel, then when is it not possible to just
swap the whole thing, the entire program?
How do you tell root that you're into dynamic swap? Whoever is running a
program will complain, assuming it hasn't been running for two weeks
already and they don't expect it to finish for another week. I suppose you
log it, with a log that never exists if you haven't reached that point
yet, and it's up to root to have a cron job checking for that log and
emailing him when it shows up.
This way you don't have to try to guess what to kill, which there is no
automated solution to. There isn't any way to know which of a set of user
space processes whose priority to the user and tolerance for being
suspended is unknown. Either programs come with hints about that
(priority) or they don't, and if they don't there isn't any way to choose
that isn't arbitrary. If you kill one accidentally by suspending it,
that isn't any worse than if your OOM algorithm killed it on purpose,
and you will end up saving a lot of people's work that the algorithm might
have lost otherwise.
You can still get to the point where you run out of disk space on your
designated failsafe dynamic swap file partition, of course, but in between
having to initially create that file and when it runs out of disk space
there is a potential large measure of additional protection over the
more draconian approaches.
I'm not suggesting that anyone abandon good ideas for currying caches and
so on, just suggestion a less scatter-shot approach to the worst case
Regards, Clayton Weaver firstname.lastname@example.org (Seattle)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
Please read the FAQ at http://www.tux.org/lkml/