Re: Memory overcommitting (was Re: http://www.redhat.com/redhat/)

Richard B. Johnson (root@analogic.com)
Thu, 20 Feb 1997 14:56:09 -0500 (EST)


On Thu, 20 Feb 1997, John Wyszynski wrote:

>
> While the whole chain has been somewhat interesting, I still believe that
> for a production environment, this is SUICIDE.
>
> (1) For the situation where large processes fork and immediately exec,
> many other *IX system have a "vfork" system call to handle the situation.
>
> (2) The claims that statically that this is okay 99.9% is hogwash. The
> airline industry may sell tickets that way, but they don't want their
> computers doing it. I've been in line when it has happened to them and
> its not a pretty site.
>
> (3) This is sure a time waster for porting and developing programs to run
> on Linux. "Hey, you know that program that worked fine on System A could
> blow up on Linux." This is espcially true for those people who want
> to move binaries from some other vendor's OS.
>
> John Wyszynski
>

I have read this thread. Many qualified Engineers and other Software
Professionals have tried to tell you that you are wrong. They tried
to teach you that you must allocate memory in the manner that Linux,
SunOS, SGI, UCB, and all other "known" Unix machines allocate memory.
Otherwise, the system will not be useful.

Therefore, I will answer each of your claims quoted here.

(1) Vfork is necesary on machines that don't have "copy on write"
capabilities within the processor. When a fork() is executed on Linux,
the address space is NOT copied or cloned in any way. The kernel marks
the "child process" data as read-only. If the child attempts to modify
the parent's pages, i.e., write to them, the kernel allocates a new
read-write page that has been copied from the parent. This is a SINGLE
page (4096 bytes on ix86). If the child trashes many pages (unlikely
with a normal exec call), each page is allocated and copied as required.

Once the exec call is made, the child gets allocated whatever pages are
necessary to load the start of the text segment (code), and whatever
data are static. The child is overwritten with the new program and program
execution begins. In no case was the child ever allocated all the pages
of a possibly bloated parent.

(2) This response is not warranted. Virtual memory systems work by allocating
new pages "on-the-fly". When a large buffer is allocated, only its potential
location is put into the pointer returned from the allocator (whether mmap
or malloc, etc.). The first access to this memory, either read or write
causes a "page fault". The kernel, in response to this page fault, maps
in a page (usually a single page). For performance reasons, it is
possible for the kernel to map in several pages, if the designer has
determined that this will reduce the overhead.

The kernel keeps track of all the pages allocated to a process. The
kernel has the capability of taking any sizeof(PAGE)-byte block of RAM
from anywhere in memory and making it appear contiguous to the process
accessing that RAM. Page size differs on different systems.

Allocation on demand becomes automatic because an access to RAM that you
think you own, but hasn't been allocated yet, will produce a page-fault
and the kernel will allocate a new page. Any attempt to access RAM that
you did not preallocate using malloc or memmap, etc., will also cause
a page-fault. However, in this case, the kernel knows that you didn't
allocate it, so it seg-faults and kills your process. This is normal
behavior. It is not system specific.

Such dynamic allocation takes very little overhead and makes good sense.
The i*86 processor, for instance, handles memory access traps in
hardware. A global discriptor table (GDT) controls what RAM accesses are
allowed on a per-process basis.

This allows a process, for instance, to allocate 4000 pages of RAM
(perhaps 32 megabytes). If it only modifies the first and the last, which
is quite possible in a data-base application, only two pages were
actually required.

However, presume that eventually all the pages will be modified. It is
unlikely that you could deliberately dirty 4000 pages within a single
time-slice. During preemption, other pages may become available. They
become available, even in a tight memory situation, because the kernel
will simply mark pages allocated to waiting processes "missing" and
give them to the memory allocator. When the missing pages are accessed,
a page-fault will occur and the kernel will find another page on a
page-by-page basis for the process that was "page-faulted" out of memory.
The contents of the missing pages will be restored, either by reading the
right page of the executable from disk (if it was the text segment) or
reading the contents of a page-file (if it was data).

The heuistics cited, and the stated percentage, pertain to the memory
allocator's idea of the resources that are available and or should become
available. It is not magic. The kernel attempts to use all available RAM.
It is used mostly for file-system buffers. When your task needs some
RAM, these buffers are the first to go. Some of the first disk acceses in
a memory-tight situation are the flushing of the file-systems to disk to
free buffers.

I assure you that this is how virtual memory works everywhere, including
VAX/VMS which claims to have "invented" virtual memory. Various Unixes do
things slightly differently, but the result is the same.

(3) Any program, written to good standards of Engineering Practice, that
will run on System "A", will run on Linux.

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.26 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-