Memory hogs

Claus Fischer (claus.fischer@microworld.com)
Sat, 17 Jul 1999 10:35:15 -0700 (PDT)


Werner Almesberger wrote:
: Hermann Schichl wrote:
: > I think it would be great if OOM handling would be
: > configurable (either kernel compile time or /proc interface)
:
: Better yet, have a device or proc file that becomes readable in this case.
: Then you can have a user-space demon to implement whatever policy you
: like.
: The demon can mlock its pages, so it should be reasonably safe from
: causing
: an OOM itself. Writing to that device/proc file could set the threshold
: for
: when OOM recovery is necessary.
:
: Reasonable policies may include enabling overflow swap space, SIGSTOP'ing
: all processes above a certain size/growth rate, paging the administrator,
: and opening a shell on the system console.
:
: - Werner

I'm not against user-space solutions as a supplementary measure.
I'm however totally against saying: "We'll find a userspace solution,
so it's OK for the kernel not to do anything intelligent."

Not for Linux.

This is the OS that I have carefully chosen for my desktop machine;
it hasn't been imposed on me by my managers, and I want it to shine
even where others spread darkness.

If you have a good userspace solution, this is not the discussion
for you. Go ahead and implement it and you're saved. Stop reading.
You don't care for the kernel OOM situation any more. [1]

If you have just found out that you have put way too little RAM into
the web server under your desk, just go ahead and spend some money,
or increase your swap space.

But if you use Linux in the way computers are used by real men: to
further the progress of mankind by performing the most complicated
and memory-intensive simulations, they you'll certainly agree that
the Linux kernel should do better than falling in with the crowd
of "well you can't just do that to our operating system" designs.

The kernel should not let the machine die and become unusable. It has
to guarantee the bare survival, which includes: integrity of init,
rlogind, etc. On my desktop box, it had better let X survive as well,
if it wants to be regarded a strategic ally (remember, the goal is
saving mankind :--)

*********

This turns out to be easily possible: Just kill the one bad job and
proceed working. This is 100% better than the present solution (slowly
corrupt all processes and let the user watch a painful death). After
the fact, the action is syslogged, the user gets mail (and the admin),
everybody grabs the proper userspace solution (one out of the 20
suggestions) and it will never happen again ...

Or alternatively, the user is totally happy with the machine killing
the right job in all instances, and doesn't even see the need for
a userspace solution.

Remember that in the present situation the offending job dies in 100%
of all cases, it just takes the rest of the machine with it. And
remember that nothing keeps you from making a good userspace solution
that never lets the system run completely out of memory, so that it
doesn't even reach the critical state.

*********

Mail-From: memwatch@your.service.com
' Dear user,
your program 'cgi-dumb' has just tried to deprive this system
of too much main memory. This warrants the death penalty,
which has been promptly executed. The parent 'httpd' has been
notified. Please educate your programs to behave properly
in future.'

versus

ssh your.service.com
ssh: connection dropped by other side

Claus

[1] This is an impersonal 'you', please don't take me literally :-)

-- 
Claus Fischer    (claus.fischer@microworld.com)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/