Re: mergemem: announce & design issues

Jacques Gelinas (jack@solucorp.qc.ca)
Wed, 18 Mar 1998 12:34:34 -0500 (EST)


On Wed, 18 Mar 1998, Stephen Williams wrote:

> Jacques Gelinas wrote:
> >> The logic of mergemem is
> > that if you start several instance of a given program, there is a good bet
> > that some memory will be initialised exactly the same. So mergement is
> > comparing different instance of a program and find all the identical
> > pages. Then it puts those page read-only and merge all process to share
> > the same physical page. It also puts the page with the copy-on-write
> > flag so a process can still continue to modify the page, if needed later.
>
> Just curious, but couldn't one load a process with the data section shared
> and copy-on-write the instant a program is loaded? Isn't it true that the
> data section can be paged out of the executable file until it is written to?
> (Zero-filled pages are initially shared anyhow.)
>
> I'm a little surprised that Linux doesn't do this, or does it? And if not,
> would doing it get some of the benefits of mergemem without the runtime
> overhead?

Linux does all this. The mergemem gadget goes much further. The idea is
that most program starts and then initialise various stuff. At this point,
the share pages are not shared anymore since the program have written in
them. So the data have been loaded of the executable and modified.

The end result is that if you start two instance of the same
program, it will perform basically the same initialisation,
creating a set of duplicate "modified" pages. The OS can't know
that easily. Here is an example.

int main ()
{
// Create a special table based on the hostname
char table[10000];
for (i=0; i<10000; i++){
...
}
// Then from now on, use that table unmodified
// as a lookup for example
}

In this example, each instance of the program is initialising a 10k table
with the exact same data, which is not fixed.

The mergemem patch walks the page allocated to all instance of a program
and find the one which are identical. It then remove the duplicates and
point all process to the same "write-protect with copy-on-write" page.

For example, I assume that if you have a text editor and load a 10 meg
document, then start another copy of the text editor and load the same
document, the mergemem patch will merge most of this 10 meg. now, the
minute you start editing the document in one editor, the page will start
to differentiate again.

The end result is about the same as if the original text editor had done a
fork().

But the idea of mergemem is that most program start and perform some
initialisation and all instance of the program share some amount of this
initialisation. This is "this amount" that mergemem can find.

The question is "is it worth it". From the date I have seen, it sounds
like it might be very useful on a multi-users server. Time will tell.

--------------------------------------------------------
Jacques Gelinas (jacques@solucorp.qc.ca)
Linuxconf: The ultimate administration system for Linux.
see http://www.solucorp.qc.ca/linuxconf
new developments: remote GUI admin, multiple machines admin, wu-ftpd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu