Re: Memory overcommitting (was Re: http://www.redhat.com/redhat/)

Theodore Y. Ts'o (tytso@MIT.EDU)
Thu, 20 Feb 1997 22:16:55 -0500


Date: Thu, 20 Feb 1997 15:40:27 -0500
From: John Wyszynski <wyszynsk@clark.net>

Thanks to all who have lobbed missiles at me, especially those who believe
that they known all that can be known.

... and of course, this description couldn't possibly describe you.....

It may be the explanation why in the last few years I have seen so many
programs die for no cause in the middle of the day. (On non-Linux systems
so far.) In an operational environment, such havoc is not appreciated.

In my decade of operational experience running Unix servers, I don't
think I've ever had a system which died because it ran out of memory ---
and I've run some pretty big servers in my day, including MIT's news
servers, MIT's mail hubs (we now deliver on average over 2 million
messages a day), etc.

Part of the reason why is that in an operational environment, you size
your machines appropriately. In my experience, memory is the easist
thing to get right. Usually the resouorces that you run into trouble
with are disk space, I/O bus bandwidth, or CPU speed. Indeed, if you're
operating close enough to the margin that the OS's memory allocation
strategy might become a reliability issue, you've criminally undersized
the memory on your system. You'll probably increase your performance by
a factor of 10 or more if you add more memory to your systen.

Actually, now that I think of it, there's only one time that I ran into
memory trouble --- and curiously enough, it happened after I replaced
tsx-11.mit.edu, Linux's main ftp server, with an Alpha running OSF/1.
For a while, I couldn't figure out why my old Ultrix system could
support more ftpd's than this very nicely configured Alpha. Well, it
turned out the reason why was because OSF/1 had been configured to not
allow memory to be overcommitted, and as a result fewer users could get
Linux becaue the ftpd's were getting memory errors because OSF/1
wouldn't allow the system to overcommit memory. As soon as I fixed the
OSF/1 VM to allow overcommits, the number of ftpd's tsx-11 could handle
(and thus the number of people who could download Linux archives), when
up by some amazing factor.

So, your milage may vary, but I'd be very, very, surprised if your
programs which "died for now reason" did so because they ran out of
memory. If you were running that low on memory, it should have been
painfully obvious.

- Ted