Re: Memory hogs

Claus Fischer (claus.fischer@microworld.com)
Sat, 17 Jul 1999 13:57:02 -0700 (PDT)


On Sat, 17 Jul 1999, Hermann Schichl wrote:

: > If you have a good userspace solution, this is not the discussion
: > for you. Go ahead and implement it and you're saved. Stop reading.
: > You don't care for the kernel OOM situation any more. [1]
: >
: This is not quite correct since it needs a proper kernel interface
: to construct a useful user space solution. Werner, IMHO, correctly
: pointed out that the FANCY solution stuff and the configurable
: parts should go to user space. The kernel itself should provide
: a reasonable default action which is stateless, simple, fast and small.
: I think we all agree that the kernel has to take proper measures
: to keep the most important processes running.

Yes, I somewhat deliberately neglected that the really good
custom-tailored solution would probably also want to have some
kernel support. There are several possibilities for that, and
depending on the applications you run you could either put more
emphasis on the full operable system or on your special job.

: > But if you use Linux in the way computers are used by real men: to
: > further the progress of mankind by performing the most complicated
: > and memory-intensive simulations, they you'll certainly agree that
: > the Linux kernel should do better than falling in with the crowd
: > of "well you can't just do that to our operating system" designs.
: >
: That is the point. IF you are doing the most complicated and
: memory-intensive simulations then you want to GUARANTEE that this one
: does not die after five days of computing time.

Well, but there's just a bit more to it. Say, you have a machine with
1 GB total VM. If there's only one simulation job on it and it has
filled up everything, you just can't expect it to finish on that
machine. The size of the rest of the system (50 MB or so) is generally
negligible; in all my experience I've never seen a situation where
that makes the difference (and even then Rik's selector would probably
first take Netscape down, since your job has 5 days of computing
time). So you'll have to redefine your job. (Smaller grid, use a
different memory model, other solver, weaker matrix preconditioning,
...). Or alternatively run it on another machine. In all cases, you
can't use the old results.

If the job is not the only big one on the machine... This is quite a
frequent situation. Say, a 2 CPU batch machine with 2 GB VM, and 2
batch jobs on it, with typically 200-400 MB each. Now it so happens
that both jobs are somewhat out of hand and one wants 700 MB, the
other one 2.4 MB. Even in this case, Rik's job evaluator will do a
quite good job and pick the right one to kill most of the time. (It
weighs memory consumption by CPU time used and total run time). As a
default last resort mechanism, this seems virtually perfect to me (no
he doesn't pay me, and it's also not my code, it just happened to be
around when I was first confronted with that situation :-).

Regardless of that, I totally agree that you might want to implement a
policy that's better suited in user-space (i.e. with the help of
user-space tools). E.g. you might want to freeze the large jobs before
the machine gets totally full and call up the users and ask them what
to do. Or you might have a good job design so that jobs can be dumped
and restarted at certain intermediate points. Or you might have some
emergency swap file that could be added.

The point being, if you have planned for that situation then you will
also know which user-space solution is best for you. If you have not
planned then killing the job doesn't do any harm since it frees the
machine for the other guys and your job wouldn't have completed anyway
(!). Indeed in a very well-designed batch system, killing the job for
OOM reasons could even trigger resubmission on a dedicated large
machine, so everybody's happy.

Freezing up and waiting for the user just means killing the job later.
If the user could have handled a job transfer or adding swap, (s)he
would also have handled OOM before it became a kernel emergency.
So the default solution should be to kill the job, which is in no case
worse than the present behaviour, and which can be supplemented by
user-space tools to handle swap addition or so.

: >
: > The kernel should not let the machine die and become unusable. It has
: > to guarantee the bare survival, which includes: integrity of init,
: > rlogind, etc. On my desktop box, it had better let X survive as well,
: >
: Yes, and my simulation if ever possible. And if the simulation is the
: problem, I probably would prefer to stop the process, decide to add some
: more swap space if I have some free hard disk space and continue the process.

At the kernel emergency point, all swap space is already added. User
interaction isn't possible any more since everything from rlogin to
xterm or any new job fork()ed by a console shell doesn't get any stack
pages. SysRq is not really feasible in a somewhat secure production
environment.

If you still have some swap space around, then you can as well add it
at boot time and set a proper threshold:
2 GB swap + 0.5 GB emergency swap :=
2.5 GB swap + user-space tools hooking in at 80 swap full

If you need a different solution than killing the job, you know it and
you have selected the right userspace tool. If you don't know then you
don't have any provisions and killing the job is the best thing.

: > if it wants to be regarded a strategic ally (remember, the goal is
: > saving mankind :--)
: >
:
: To summarize my opinion:
: We need proper kernel handling of OOM situations it should be simple,
: stateless, and failsafe.

Yes.

: In addition we should have a nice interface (device, proc, ...)
: which makes it possible to construct a user space solution of
: NEAR OOM situations which is configurable (fancy and shining :) if
: necessary) to make it suitable for (almost) all needs out there.

Yes.
I think that brings it to the point.

Best regards,

Claus

-- 
Claus Fischer    (claus.fischer@microworld.com)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/