Re: A true story of a crash.

Albert D. Cahalan (acahalan@cs.uml.edu)
Sat, 15 Aug 1998 17:17:35 -0400 (EDT)


Matt Agler writes:

> Hmm, doesn't that seem a bit complicated? The whole problem here is that
> the computer really has no knowledge of what should and should not be
> killed. You're just making elaborate guesses.

Exactly. It is no worse than deciding what to swap out. Linux has no
knowledge of what will be needed next, yet it makes decisions about
what should be swapped or thrown out.

> The kernel can't read the users mind to find out which process
> is least important. There's no static mapping between size,
> priority, resource use, etc. to importance.

That is right. So what? The computer can't just wait for an admin.
It can kill random processes (bad) or kill selected processes (not worse).
Other options include halt, hard reboot, and crash.

Our current situation is bad. At the very least, Rik van Riel's process
killer will not be worse than what we already have. I expect that most
people will find it much better than merely "not worse" though.

> It would be better and simpler to let the user or admin decide what to
> kill. Instead of killing a process, we should put it to sleep.

End result: 100% memory use, 100% idle, all processes stopped.

> If the machine has overextended itself, we're probably swapping like mad
> already. It's hammered. We're not getting anything done. We don't need
> efficiency anymore. We want recovery without loosing in-process work.

Not possible.

> For example, let's put each process, that asks for a page that we can't
> give, to sleep (from do_no_page?). This would be a special sleep in that
> it doesn't wakeup until we return to a certain threshold of free memory.
> What would happen is that it's pages would age and get thrown out.

Thrown out? You must mean that literally, since there may be no more swap.
The process will be really messed up if you send pages to /dev/null.

> Other processes would complete.

No they wouldn't, since they need memory too. They may also be long-lived
daemon processes.

> The load would be reduced until the machine was recoverable.
>
> root could login and fix the problem, add swap, kill stuff, whatever.
...
> Admittedly, root would need to allocate memory and so any root processes
> should probably be exempt.

Hostile users can just use a daemon to grab the last bit of memory
that you reserved for root. (finger, telnet, whatever)

With the overcommit, random events could be enough to eat up all
the reserved space.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html