Re: Avoiding *mandatory* overcommit...

From: Marco Colombo (marco@esi.it)
Date: Mon Apr 03 2000 - 07:07:04 EST


On Mon, 3 Apr 2000, Jesse Pollard wrote:

[...]
> >And, under Linux, should expect to fail when the system is OOM.
> >Again it's the process that fails, not the program.
>
> According to the manpage, brk returns -1 on failure. malloc should
> be able to return a null pointer indicating failure.

Address space is a finite resource (on 32 bits archs; on 64 it's
close to infinite for pratical purposes). So brk() can fail, and will,
if you're out of addresses. This has nothing to do with memory.
As I've already said, an address is a name, memory is an object.

> >> >
> >> >My RedHat Linux 6.1 brk() man page states just that:
> >> > brk sets the end of the data segment to the value speci-
> >> > fied by end_data_segment. end_datasegment must be greater
> >> > than end of the text segment and it must be 16kB before
> >> > the end of the stack.
> >> >
> >> >It says *nothing* about allocating space. The "non allocating" behaviour
> >> >*is* documented. So, I have to say it again, if a program uses malloc()
> >> >expecting the kernel to really allocate resources to it, it is *buggy*.
> >> >It should use another interface. mlock() is one way to get real resources
> >> >(RAM). I'm not saying that the interface it provides is enough for all
> >> >your needs: but it should be clear that malloc() is NOT what you should
> >> >use when you need real allocation.
> >>
> >> Actually that sounds more like a documentation bug. If brk cannot allocate
> >> memory then nothing can allocate memory, and no process can trust its' own
> >> storage.
> >
> >Precisely. If a process needs to "trust its' own storage", it has to use
> >another system call. Not brk(). I don't really understand why you
> >say: "If brk cannot allocate memory then nothing can allocate memory".
> >mlock() DOES allocate memory, that's why I usually name it as an example.
> >brk() does not, and is not supposed to, allocate any resource (ok, but PTEs).
>
> brk IS supposed to - it has the kernel calls to allocate the pages.

mmap maps *pages*. Pages are logical objects. They can't be "allocated".
They can only be created and mapped. When you brk(), COW (or zero-on-write,
which is a special case of COW) pages are created. When you write to a
COW page, a page-frame (RAM) is allocated, the page (logical object) is mapped
to that page-frame (physical object), so you can write to an address
(the name you use to refer to the logical object "page").

[...]
> >> malloc is supposed to be ANSI; but if it cannot be used to allocate memory
> >> then it isn't ANSI either.
> >>
> >> BTW, the kernel support for brk contains:
> >>
> >> if (do_mmap(NULL, oldbrk, newbrk-oldbrk,
> >> PROT_READ|PROT_WRITE|PROT_EXEC,
> >> MAP_FIXED|MAP_PRIVATE, 0) != oldbrk)
> >>
> >> The manpage also implies that memory can only be increased, but the kernel
> >> code says it can be reduced too.
> >
> >So? Increasing is the standard API. We have an extended one. What's wrong
> >with that?
>
> It's NOT extended. It does have to support malloc. In fact, based on the
> code, it looks appropriate. The only problem I can see is the lack of accounting
> for the allocation. The same can be said for fork, mmap, and stack allocation.

mmap does not allocate memory. Allocation can't be accounted because no
allocation is performed.

> >> If this doesn't map pages into the process, what does? This certainly
> >> looks like it allocates memory. Now that memory may have to be initialized
> >> (ie, demand zero page fault), but this looks like a real allocation to me.
> >
> >Yes, it maps process pages. And doing that it does not allocate *ANY* memory.
> >It is not demand zeroing, it's demand *allocation* and zeroing. It's a kind
> >of COW. You don't allocate memory until it's really needed. Thus it's not
> >a 'real allocation' at all. And that's true on any system with demand copy.
> >Memory may be accounted (and reserved), but it's not used until you write
> >a page.
>
> The page is mapped. That grants the process permission to modify it. To the
> process, the space is usable.

Definitely not true. COW pages are marked as NON WRITABLE by the process.
It *will* page fault when trying to write them. The kernel will try and
allocate RAM to allow the write operation. If OOM, it's ok to deny the
request, IMHO.

> >> >> > For stack grow, maybe we need some way to tell the kernel:
> >> >> > "never page-out my stack, and reserve me this space...".
> >> >> ---
> >> >> Paging out is not the issue. The issue is not having enough
> >> >> combined memory and swap space. OOM doesn't simply mean out of physical
> >> >> memory -- it means out of swap space as well. For this discussion most
> >> >> people are using "memory" to mean "memory+swap".
> >> >
> >> >I know. But it think that mlock()ing stack pages could be easy to implement.
> >> >And it gives you a way to write "secure" programs. In a "secure" program
> >> >you should control stack grow anyway.
> >>
> >> I wouldn't lock the stack - most of the entries on the stack are not going
> >> to be used. Resident memory is more important than that. Page it out if
> >> necessary.
> >
> >Just see below.
> >
> >> >And, reading previous postings, now I know you can manage your own
> >> >stack. This is even easier. Just set your stack up, mlock() *a few* pages,
> >> >and write them to disk when you need more space. The only "active" part
> >> >of a stack is the top, so it's very easy to manage a file image of it.
> >
> >You don't need to lock the whole stack, just the top of it.
>
> Locking the stack at all is a performance optimization function. Not something
> to be done just any old time to get reliable functionality. It should be
> available for locking for that very reason.

If for 'reliable functionality' you mean to survive to OOM conditions, yes.
You need to mlock() a few pages and perform explicit swap in and out
operations.

> >> >> > Applications should be able to bypass kernel management of their address
> >> >> > space. But this should be done on a per-app base.
> >> >> ---
> >> >> I agree with this statement, but it isn't relevant to the discussion
> >> >> topic.
> >> >
> >> >Here I don't follow you. A per-application mm management is much better
> >> >than playing with system wide setting (such as disabling overcommit).
> >>
> >> Because the topic is the kernel, kernel resource management, and the kernel
> >> interaction with processes - specificly the memory allocation to processes.
> >
> >Right. So why even mentioning malloc()? If I write programs in ASM,
> >there's no malloc(). The kernel interface is brk() (or mmap()), so
> >let's discuss about it. malloc() has nothing to do with kernel "memory
> >allocation to processes". It's only one way to manage a heap in a C
> >*program*.
>
> brk extends (or contracts) memory space. Once extended, I expect the process
> to be able to use it. Once contracted, I do not expect to be able to use
> the space released. This is another "stack" allocation; even if implemented
> in software.

The process *is* able to use its extended address (and not 'memory') space.
It can always read from a COW page. It can write to it, provided that there
are enough resorces to allow it.
I really don't understand here why you insist in this "processes not being
able of using they address space after brk()/mmap()".
Reserving resources (memory or VM, whatever you call it) for COW pages
is just a waste. You don't know if a process will ever write a single
page of that space (you mau expect it, for brk(). Surely not for fork()).
Allocating resources only when really needed is a big win.

> >> Per application mm management is userspace. Unless the kernel can supply
> >> the resources, application managment is useless - there are no resources
> >> to manage...
> >
> >Not at all. Of course, the interface the kernel provides to applications
> >*is* a kernel issue. And the kernel *can* behave differently on a per
> >application base.
> >
> >And the kernel *can* supply resources to a process. Use mlock() and
> >the kernel *will* give you real resources. If you use just brk() it
> >won't give you any resource (RAM or swap). It will just set things up
> >so that it will give you resources when you ask for them (write the pages,
> >or mlock() them). That's the defined (and documented) behaviour of brk()
> >under Linux.
>
> The process is not asking for a resident set size change; that may not be
> available. I just want to use address space that the kernel has provided.

You can use that address space. You can either read from it or write to it.
The former does not require any other resource to be allocated, just
address space (which has been already granted by the kernel). The latter
needs *other* resources (RAM, swap, VM, ...) to be carried out, so it may
fail.

> If it is necessary that the library function malloc be modified to lock
> pages into memory then the entire system is buggy.

C *programs* using malloc() work just fine, with current malloc()
implementation. Linux *processes* may be killed.

>
> Locking pages into memory is supposed to be a process performance optimization,
> not a mandatory operation to get proper functionality.

You get "proper", "expected", "documented" functionality with current
implementation.

>
> No wonder linux is occasionally called "not ready for production..".

This is just FUD, no comment.

I just repeat that bck()/mmap() work as documented. So the system is not
buggy.

>
> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: pollard@cats-chateau.net
>
> Any opinions expressed are solely my own.
>

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:09 EST