Re: Avoiding *mandatory* overcommit...

From: Marco Colombo (marco@esi.it)
Date: Mon Apr 03 2000 - 05:19:18 EST


On Fri, 31 Mar 2000, Jesse Pollard wrote:

> On Fri, 31 Mar 2000, Marco Colombo wrote:
> >On Thu, 30 Mar 2000, Linda Walsh wrote:
> >
> >[...]
> >> Marco Colombo wrote:
> >> > If you use plain malloc(), you're not allowed to think you have any
> >> > space guaranteed. It's bad programming.
> >> ---
> >> ?! mlock locks pages in memory. I just want to malloc (from the
> >> man page):
> >>
> >> malloc() allocates size bytes and returns a pointer to the
> >> allocated memory. The memory is not cleared.
> >> ...
> >> For calloc() and malloc(), the value returned is a pointer
> >> to the allocated memory, which is suitably aligned for any
> >> kind of variable, or NULL if the request fails.
> >>
> >>
> >> It's not bad programming to expect that malloc will allocate memory.
> >> It's the documented interface. It is the documented interface to return
> >> NULL if it cannot allocate the memory. With overcommit, the kernel has
> >> broken this model because the memory isn't really allocated -- just the
> >> process's top of heap pointer has been moved. My contention is that this
> >> is not ANSI-C compliant.
> >
> >According to ANSI-C, for(;;); will run forever. Most believe the the
> >Universe is finite in Time, so why don't you complain on the
> >universe-hackers@vger.god. mailing list? B-)
>
> nothing to do with memory management.

Oh well, you missed the "B-)"
And my point was that malloc() has nothing to do with *kernel*
memory management.

> >Memory is just a concept, just as time, in the definition of a programming
> >language. The OS maps (literaly) that concept to real "resources"
> >(RAM, swap, ...). So "allocated memory" means *nothing* in the malloc()
> >manual. The OS chooses to implement "memory" the way it likes. It
> >can be just plain RAM in a single address space (unprotected memory), where
> >malloc() allocates "system" memory. Or can be disk space, with RAM used
> >only a as cache for recently used parts. Or a piece of VM (swap+RAM).
>
> So, I take it that Linux doesn't really exist - it's just a concept....

Since you forgot to put a B-), I can only suggest you to think about
the difference between a "program" and a "process". ANSI-C is about
how to write C programs.

> >On OOM, you don't get any C error. The *program* does not fail in any way.
> >It's the *process* that gets killed. The C standard know nothing about
> >what a process is, and how a process interacts with the system.
>
> Sure don't - you should get a ENOMEM for most things.
>
> >In a UNIX-like enviroment, program bugs usually cause some system events
> >on the process it is used to run the program. But we have
> >to thank the UNIX design for this. Under DOS, program bugs (a ranaway
> >pointer, for example) are more difficult to track. On the converse,
> >it is not true that the system delivers signals to a process only because
> >the program it runs has a bug. Silly example (I have already made):
> >SIGTERM on shutdown, SIGHUP on control tty hangup, SIGINT for ^C,
> >SIGTSTP on ^Z, or even SIGUSR[12], can be received by a process running
> >a legal ANSI-C program, causing actions to be taken, without the ANSI
> >standard even mentioning them. That's *UNIX programming*, not C programming.
> >So the standard we should refer to (among others) is POSIX, not ANSI-C.
> >And BTW, under Linux I program using cat | gas... who cares ANSI-C? B-)
>
> This isn't relevent.

Really? malloc() is C programming. Not a kernel issue.

> >Memory allocation in a C program is a completely different concept from
> >memory allocation by a UNIX process. A process does not allocate memory
> >at all. It just requests its address space to be extended. See brk()
> >manual. On Solaris 2.5.1, the man page clearly states that space gets
> >allocated. And, among possible errors "ENOMEM: Insufficient space exists
> >in the swap area to support the expansion.", indicating that available
> >swap (and not VM) is checked, BTW.
>
> If the process address space is extended then the process expects to use
> it.

And, under Linux, should expect to fail when the system is OOM.
Again it's the process that fails, not the program.

> >
> >My RedHat Linux 6.1 brk() man page states just that:
> > brk sets the end of the data segment to the value speci-
> > fied by end_data_segment. end_datasegment must be greater
> > than end of the text segment and it must be 16kB before
> > the end of the stack.
> >
> >It says *nothing* about allocating space. The "non allocating" behaviour
> >*is* documented. So, I have to say it again, if a program uses malloc()
> >expecting the kernel to really allocate resources to it, it is *buggy*.
> >It should use another interface. mlock() is one way to get real resources
> >(RAM). I'm not saying that the interface it provides is enough for all
> >your needs: but it should be clear that malloc() is NOT what you should
> >use when you need real allocation.
>
> Actually that sounds more like a documentation bug. If brk cannot allocate
> memory then nothing can allocate memory, and no process can trust its' own
> storage.

Precisely. If a process needs to "trust its' own storage", it has to use
another system call. Not brk(). I don't really understand why you
say: "If brk cannot allocate memory then nothing can allocate memory".
mlock() DOES allocate memory, that's why I usually name it as an example.
brk() does not, and is not supposed to, allocate any resource (ok, but PTEs).

> >> > If you need guaranteed "space"
> >> > (memory) use another kernel interface, such as mlock(). I'm not saying
> >> > the current interface is perfect. I'm just saying that overcommitting
> >> > is not the problem. You don't need to turn overcommiting off. You
> >> > need you use a better interface than malloc() to get "safe" memory.
> >> ---
> >> Not if we claim to be ANSI compliant.
> >
> >But i don't claim to be ANSI compliant: I'm Italian. B-)
> >
> >The *kernel* is not ANSI (C) compliant. It's a compiler issue, not OS.
> >Maybe you mean POSIX?
>
> The kernel is supposed to be POSIX. I believe the implementation of
> brk in linux is not POSIX compliant, unless your statement is not
> really true.

Again from RHL6.1 brk() man:

       brk and sbrk are not defined in the C Standard and are
       deliberately excluded from the POSIX.1 standard (see para-
       graphs B.1.1.1.3 and B.8.3.3).

> malloc is supposed to be ANSI; but if it cannot be used to allocate memory
> then it isn't ANSI either.
>
> BTW, the kernel support for brk contains:
>
> if (do_mmap(NULL, oldbrk, newbrk-oldbrk,
> PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_FIXED|MAP_PRIVATE, 0) != oldbrk)
>
> The manpage also implies that memory can only be increased, but the kernel
> code says it can be reduced too.

So? Increasing is the standard API. We have an extended one. What's wrong
with that?

> If this doesn't map pages into the process, what does? This certainly
> looks like it allocates memory. Now that memory may have to be initialized
> (ie, demand zero page fault), but this looks like a real allocation to me.

Yes, it maps process pages. And doing that it does not allocate *ANY* memory.
It is not demand zeroing, it's demand *allocation* and zeroing. It's a kind
of COW. You don't allocate memory until it's really needed. Thus it's not
a 'real allocation' at all. And that's true on any system with demand copy.
Memory may be accounted (and reserved), but it's not used until you write
a page.

> >> > For stack grow, maybe we need some way to tell the kernel:
> >> > "never page-out my stack, and reserve me this space...".
> >> ---
> >> Paging out is not the issue. The issue is not having enough
> >> combined memory and swap space. OOM doesn't simply mean out of physical
> >> memory -- it means out of swap space as well. For this discussion most
> >> people are using "memory" to mean "memory+swap".
> >
> >I know. But it think that mlock()ing stack pages could be easy to implement.
> >And it gives you a way to write "secure" programs. In a "secure" program
> >you should control stack grow anyway.
>
> I wouldn't lock the stack - most of the entries on the stack are not going
> to be used. Resident memory is more important than that. Page it out if
> necessary.

Just see below.

> >And, reading previous postings, now I know you can manage your own
> >stack. This is even easier. Just set your stack up, mlock() *a few* pages,
> >and write them to disk when you need more space. The only "active" part
> >of a stack is the top, so it's very easy to manage a file image of it.

You don't need to lock the whole stack, just the top of it.

> >> > Applications should be able to bypass kernel management of their address
> >> > space. But this should be done on a per-app base.
> >> ---
> >> I agree with this statement, but it isn't relevant to the discussion
> >> topic.
> >
> >Here I don't follow you. A per-application mm management is much better
> >than playing with system wide setting (such as disabling overcommit).
>
> Because the topic is the kernel, kernel resource management, and the kernel
> interaction with processes - specificly the memory allocation to processes.

Right. So why even mentioning malloc()? If I write programs in ASM,
there's no malloc(). The kernel interface is brk() (or mmap()), so
let's discuss about it. malloc() has nothing to do with kernel "memory
allocation to processes". It's only one way to manage a heap in a C
*program*.

> Per application mm management is userspace. Unless the kernel can supply
> the resources, application managment is useless - there are no resources
> to manage...

Not at all. Of course, the interface the kernel provides to applications
*is* a kernel issue. And the kernel *can* behave differently on a per
application base.

And the kernel *can* supply resources to a process. Use mlock() and
the kernel *will* give you real resources. If you use just brk() it
won't give you any resource (RAM or swap). It will just set things up
so that it will give you resources when you ask for them (write the pages,
or mlock() them). That's the defined (and documented) behaviour of brk()
under Linux.

> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: pollard@cats-chateau.net
>
> Any opinions expressed are solely my own.

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:09 EST