Re: Overcommitable memory??

From: James Sutherland (jas88@cam.ac.uk)
Date: Tue Mar 14 2000 - 16:57:32 EST


On Mon, 13 Mar 2000, Michael Bacarella wrote:

>
> On Mon, 13 Mar 2000, Paul Jakma wrote:
>
> > > And please explain why my simulation -- that may have started many weeks
> > > (or months) ago -- should "just exit" because some random 5-minute old
> > > Mathematica process went and allocated half a gigabyte of memory?
> > >
> >
> > There's no easy answer to the OOM problem is there?
>
> There are plenty of easy answers, they're just not answers that everyone
> will be happy with. :)

There are easy answers, and then there are popular answers :)

> It seems like the current solution is accepted because it doesn't
> discriminate against any one purpose or user. I'm almost positive that the
> person who wrote the code was pulling their hair out trying to figure
> the best way to handle the condition, and just decided that the way to do
> it best without receiving flak or alienating anyone is to start offing
> processes at random.
>
> FreeBSD has solved this problem by simply mandating "THIS IS WHAT WILL BE
> DONE, YOU DEAL WITH IT IF IT MESSES YOU". Linux users are all
> trying to make a case against one another. A lot of people have good
> arguements for their side, some of which may conflict with others, but
> there are no arguments that 100% of people can accept. I'm guessing that
> if you presented everyone with 10 solutions, and told them to rate
> them in order of preference, the random killing solution would have
> the most points. :)

There is a better approach - leave it up to the sysadmin. Give them a
couple of options - random killing, selective killing, freeze the
processes for a couple of minutes and see if that improves matters (yes,
someone did suggest that!), etc. That can come at a later date.

The biggest flaw with the "random" killing approach is that the very
simple malicious code I mentioned earlier (while(1){malloc(4096);}) will
eventually kill EVERY process, other than itself. Because it doesn't die
when malloc() fails, it survives the old OOM response - unlike every other
process.

> This is a decision that someone with recognized authority (for lack of a
> better word) has to make after hearing all sides, and looking to implement
> a solution that the most people can live with, leaving the select few who
> will be affected negative, to try to somehow cope.
>
> Now for my own bias: This issue could probably be reduced if you could
> effectively limit a user to X megs of memory, whether they use it with
> 1 process at X megs or 50 processes at X/50 megs each. Therefore, if your
> system has N users max, you could set aside N*X megs of VM.

Per-user resource quotas would certainly make a big difference here - so
it would be nice to have them. Wish-list entry #654561844164841654 it is,
then :-)

> The problem still exists though, but it can be more easily controlled by
> those who care to address the problem themselves. :)

As I've said before, this patch refines matters from "kill processes
randomly, but leave any malicious code running to make sure it kills
everything" to "kill suspicious looking things first, until we've killed
the real culprit". Not perfect, but a big improvement.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:28 EST