Re: Memory over: committing SUICIDE

Cameron MacKinnon (mackin@interlog.com)
Fri, 21 Feb 1997 00:33:55 -0500


John Wyszynski <wyszynsk@clark.net> wrote:
> While the whole chain has been somewhat interesting, I still believe that
> for a production environment, this is SUICIDE.
>
> (1) For the situation where large processes fork and immediately exec,
> many other *IX system have a "vfork" system call to handle the situation.

Feel free to write one if you require it. You could also consider
donating it to the cause.

> (2) The claims that statically [sic] that this is okay 99.9% is hogwash.
...
> (3) This is sure a time waster for porting and developing programs to run
> on Linux. "Hey, you know that program that worked fine on System A could
> blow up on Linux."

If application X requires more that the (conservative, I believe) 99.9%
reliability provided by Linux in the "throw it against the wall and see
if it sticks" mode, then the developers/implementors of this application
should be prepared to design in functionality that guarantees that the
app will never ask for more core/VM than "y", specify that at least "y"
be available in the production environment, and specify that the
production environment contains only application X. These are standard
design principles for high reliability systems. Saying "We might ask for
a lot of memory, we don't know how much and we might not need it anyway,
and by the way this might run on a box with unknown core/VM, with other
unknown applications with unknown resource requirements BUT we expect
the operating system to do all of our dirty work for us and failure
should always be graceful" is highly disingenious. It also burdens the
OS with the expensive prospect of guaranteeing a level of reliability
that 99.9% of people don't need.

For the type of systems we're now discussing (airline reservations was
given as an example), failure is generally defined as "takes longer than
n seconds to process a transaction" - a far more conservative metric
than "system halts and catches fire". A lot more analysis is required
than just trusting the operating system to find your memory leaks for
you.

Another note: There's legacy software aplenty that depends on
overcommits. FORTRAN doesn't allow dynamic allocation, so maximum memory
size must be set at compile time. It's much more convenient to have a
matrix library that blows up at run time if you feed it a 10k x 10k
array on a small box than to have one that won't even load, ever.

UNIX was initially designed as a minimalist kernel aimed at interactive
use. Vendors have since grafted on realtime and fault tolerant aspects,
with varying degrees of success. With that in mind, I'd like to know
which of SVR4, SunOS, Solaris, BSD, SCO, HP/UX, Digital UNIX and AIX
disallow overcommits. What's POSIX got to say?

Barring new disclosure of other OS's capabilities, or submitted patches,
let's end this thread. That way, I get the last word 8-)