Re: Avoiding *mandatory* overcommit...

From: Marco Colombo (marco@esi.it)
Date: Mon Apr 03 2000 - 08:40:31 EST


On Mon, 3 Apr 2000, Jesse Pollard wrote:

> Marco Colombo <marco@esi.it>
> > On Mon, 3 Apr 2000, Jesse Pollard wrote:
> >
> > [...]
> > > >And, under Linux, should expect to fail when the system is OOM.
> > > >Again it's the process that fails, not the program.
> > >
> > > According to the manpage, brk returns -1 on failure. malloc should
> > > be able to return a null pointer indicating failure.
> >
> > Address space is a finite resource (on 32 bits archs; on 64 it's
> > close to infinite for pratical purposes). So brk() can fail, and will,
> > if you're out of addresses. This has nothing to do with memory.
> > As I've already said, an address is a name, memory is an object.
>
> I don't care what you call it. If the kernel gives it to me I expect to
> be able use it.

It's not a matter of how I call it. The kernel gives you addresses, which
you are able to use. You read at a given address, you get data. You
write to an address, and you ask the kernel to allocate memory. That
request may fail.

[...]
> > mmap maps *pages*. Pages are logical objects. They can't be "allocated".
> > They can only be created and mapped. When you brk(), COW (or zero-on-write,
> > which is a special case of COW) pages are created. When you write to a
> > COW page, a page-frame (RAM) is allocated, the page (logical object) is mapped
> > to that page-frame (physical object), so you can write to an address
> > (the name you use to refer to the logical object "page").
>
> If the kernel gives me a place to put data; then I EXPECT TO BE ABLE TO USE IT.
> If I can't then the system is buggy.

Again, the kernel gives you names for objects, not room to store them.
Sorry, that's what you asked for, and that's what you get. If you want
room for your objects, use another system call.

[...]
> > Definitely not true. COW pages are marked as NON WRITABLE by the process.
> > It *will* page fault when trying to write them. The kernel will try and
> > allocate RAM to allow the write operation. If OOM, it's ok to deny the
> > request, IMHO.
>
> we were refering to brk, and the heap. If we refer to text, then we
> refer to COW.

mmap a file, it will go in your data segment, not text... think of
brk() as mmapping /dev/zero... COW is for data, too.

> IF I AM GRANTED ACCESS TO THE PAGE THEN I CAN ACCESS IT. IF I AM GRANTED
> WRITE ACCESS THEN I EXPECT TO BE ABLE TO WRITE TO IT. If I can't then
> I call the system buggy.

before considering the system buggy you should rethink of what COW means...

[...]
> > If for 'reliable functionality' you mean to survive to OOM conditions, yes.
> > You need to mlock() a few pages and perform explicit swap in and out
> > operations.
>
> sorry - that is not required for "reliable functionality". The kernel has given
> access, then I can access. If the kernel cannot give access then I should be
> given the error. The only problem is the lack of accountability of memory
> resources.

Memory is accounted. Your problem is that brk() does not give you any
memory. If you want memory, use another system call.

[...]
> > The process *is* able to use its extended address (and not 'memory') space.
> > It can always read from a COW page. It can write to it, provided that there
> > are enough resorces to allow it.
> > I really don't understand here why you insist in this "processes not being
> > able of using they address space after brk()/mmap()".
> > Reserving resources (memory or VM, whatever you call it) for COW pages
> > is just a waste. You don't know if a process will ever write a single
> > page of that space (you mau expect it, for brk(). Surely not for fork()).
> > Allocating resources only when really needed is a big win.
>
> if brk cannot give me the ability to write to a location, then it should give
> an error return.

The kernel can't foresee the future. And it's silly to waste megs of
space "just in case".

> It is called accountability, and reliability. If you chose to overdraft, thats
> your business. I want the ability to control it.

Use another system call. Implement your own memory management.
Or just show a single example of an application, *written following the docs*,
that will fail.

> Besides, I wasn't refering to COW pages at the time. brk cannot give access
> to COW. fork can. mmap can't.

???

>From mmap() manual:

       MAP_PRIVATE
                  Create a private copy-on-write mapping.

[ says nothing on memory alloction, just create a "mapping" ]

And, BTW, fork() creates COW pages for data, too. No memory is allocated,
unless you write to one of those pages.

[...]
> > You can use that address space. You can either read from it or write to it.
> > The former does not require any other resource to be allocated, just
> > address space (which has been already granted by the kernel). The latter
> > needs *other* resources (RAM, swap, VM, ...) to be carried out, so it may
> > fail.
>
> If it can fail then I expect the address modification request to fail. Not
> somewhere else in the process.

Why? The address space modification succeeded. It's your later memory
allocation request that fails...

> > > If it is necessary that the library function malloc be modified to lock
> > > pages into memory then the entire system is buggy.
> >
> > C *programs* using malloc() work just fine, with current malloc()
> > implementation. Linux *processes* may be killed.
>
> If it is necessary that the library function malloc be modified to
> lock pages into memory then the entire system is buggy. It doesn't matter
> if processes may be killed or not. They need to be given the error that the
> API has been defined to supply. Since they don't - buggy.

The C API does not say anything about processes being killed. So killing
a process is always legal (i.e. does not break the C standard).

> > > Locking pages into memory is supposed to be a process performance optimization,
> > > not a mandatory operation to get proper functionality.
> >
> > You get "proper", "expected", "documented" functionality with current
> > implementation.
>
> Until you are out of resources - hence buggy, improper, not expected, and
> not documented, and uncontroled.

Not documented? This is plain false. brk() works as documented. And
mmap(), too. Linux implemetation is different from other Unices? Who cares?
You still have to show it breaks some standard. If the standard leaves
room for different implementations, and your program expects an
implementation dependant behaviour, it's your fault. Your program is not
portable.

> > > No wonder linux is occasionally called "not ready for production..".
> >
> > This is just FUD, no comment.
>
> Truth is truth - I was given this from an engineer from a large vendor and
> told why. This problem with memory is one of the items.
>
> > I just repeat that bck()/mmap() work as documented. So the system is not
> > buggy.
>
> Too bad - your systems will not be reliable. I want, and expect, better.

It works as documented. It's 100% reliable. If your programs expect some
different behaviour, *they* are buggy. Read the documentation and change
them.

> If I am given the option of accounting for memory, and the option to
> enforce or not enforce allocations, then it is up to me to configure the
> system for the behaviour desired.
>
> The current behaviour is bad since the normal API interfaces for proper
> functionality is not followed.

brk() is the normal API. And it works as expected.
Look, i understand perfectly what you're saying, no need to repeat it.
I simply think it's wrong. "Proper functionality" *is* followed.
You've got the docs. Read them, and write your programs after them.

I agree in ending this discussion, since I'm just repeating my arguments.
Either you accept them, or not, there's little more I can say to explain
them better.

> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: pollard@navo.hpc.mil
>
> Any opinions expressed are solely my own.

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:09 EST