Re: A true story of a crash.

Rik van Riel (linker@z.ml.org)
Sat, 15 Aug 1998 10:50:56 -0400 (EDT)


You could also check what processes have network sockets open.

You done want to kill:

1) Daemons (children of init)
2) root processes
3) processes with network clients
4) processes with special permissions (anything banging the hardware)
5) anything else without a controling terminal
6) would it be reasonable to make all non root daemons run with a GID of
<100? if so dont kill these.

After you've eliminated from the killable list:

You sigterm, and log:

1) Processes activly trying to allocate memory
2) Processes that have only been running a short time
3) Big processes
4) processes with lots of children (fork bombs)
5) big process trees (you add up the ram of all the children)
6) processes with shared memsegments
7) auto remove any dangling shm segments.

After this you then sigkill the above if you are still tight.

Then, if you are still tight you sigterm:

6) processes with mlocked memory (excluding the first list)
7) All tasks not running with uid 0, or gid<100.

Then you kill the above.

Then you wait a userdefinable time out, and either reboot the computer or
kil everything except init and signal init to resart.

This would prob deal with OOM preety well, could we also define long
running process using zerocopyed memory they have already allocated as
'not allocating memory'?

It would be nice if someone made a daemon to do something like this:

If (swapused/swap>.85 and diskfree(/tmp)>32M) {
create_empty_file(/tmp/random,700,16MB);
mkswap(/tmp/random);
swapon(/tmp/random);
new_swaps=new_newswaps + 1;
}

If (swapused/swap<.40 and new_swaps>0) {
file=leastused_newswap();
swapoff(file);
unlink(file);
}

It should run as root, run with a negitive nice level, perhaps have all
it's memory locked, and be very small.

Part of the problem with Linux running OOM is the it starts swapping SO
hard with the little space left at the end.. If it could add a little swap
perhaps things would work better.

The above daemon could even initatite some sort of userspace earley OOM
handling, like sending out SIGWARNS (does Linux/linux apps do sigwarns?)
and using smarter methods of finding the memory gobbling proceesses.

-
Gregory Maxwell

On Fri, 14 Aug 1998, Theodore Y. Ts'o wrote:

> Date: Fri, 14 Aug 1998 13:15:26 -0500
> From: Ian and Iris <brooke@mail.jump.net>
>
> After some thought, you consider that fork-bombs are nowhere near as
> common on a relatively well-behaived "Personal" system as is running
> out of memory. Thus, it makes sense to kill the largest process not
> owned by root unless there are no more, then the largest process
> owned by root as long as it's not init, then just give up on the
> theory that if init wants to take down the system there are other,
> larger problems.
>
> Why largest? It's probably the out-of-control one.
>
> There's only one problem with this strategy --- which was originally
> used by AIX, by the way. If the largest process happens to be the X
> server, and the reason why you're out of memory was because you have
> lots and lots of (smaller) X programs running, the kernel will kill off
> the X server, which will keep the system up and free lots of memory
> (since not only will the X server exit, but all of the X client
> applications will die too!).
>
> However, users might not find that to be the most reasonable behaviour,
> since they might lose a lot of work, and the server clearly killed many
> more processes than it needed to.
>
> Granted that it would be nice to make Linux handle this situation more
> gracefully, but in general this is a very, very, hard problem to handle
> "correctly" in all cases. In my view, the general case solution is that
> you should never let yourself get that badly overcommitted. For
> performance reasons, I usually like to make sure I have enough memory so
> that all or most of the time, everything I need is in core, and I don't
> need to be swapping at all. The swap space I then use for the emergency
> cases when I need slightly more memory than I have --- and I never let
> myself get near the "redline" case at all.
>
> The other strategy which probably works better is to kill off the
> process which tried asking for memory when the kernel had trouble
> servicing its request. This has the advantage that you avoid killing
> the long-term, stable processes that aren't requested new pages, even if
> they've are pretty big. Like your solution, it's an attempt to try to
> kill off the out-of-control process, while avoiding the "benign"
> processes.
>
> - Ted
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html