Re: pre9-4 OOM VM lockups

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Tue May 23 2000 - 18:07:47 EST


t.n.vanderleeuw@chello.nl
>
> On 23 May, Juan J. Quintela wrote:
> >>>>>> "tim" == t n vanderleeuw <t.n.vanderleeuw@chello.nl> writes:
> >
> > tim> Hi,
> > tim> with pre9-3 and pre9-4 I'm experiencing lockups when OOM.
> > tim> A random process will be killed (NOT the runaway process) and the
> > tim> computer hangs solid after that (still pingable, but nothing else
> > tim> works, not even VT switching or capslock toggle).
> >
> [...]
> >
> > If that is reproducible, could you give me a program/command line that
> > shows the OOM. I also thougth that the OOM problems was gone.
> >
> > Later, Juan.
> >
>
> I have the following memory-eater which can repeatedly kill my box.
>
> The symptoms: VT switching still works, unlike what I described above.
> But other than that, the machine is that. After only a few seconds, the
> computer stops taking any keyboard input. About half a minute later,
> the first (more or less random) program is killed, and after that klogd
> seems to be the victim once every minute or so.
>
>
> The runaway debconf script had a worse effect on the system tho. I
> don't know what else it did to cause problems.
>
>
> I'm going to test pre9-5 now.

They will all fail this test program.

> --Tim
>
> /*
> eatmem.c
>
> If you have more memory than about 64MB RAM + 130MB swap, you might
> need to inrease the ARR_SIZE to get an OOM situation.
>
> Tim N. van der leeuw <t.n.vanderleeuw@chello.nl>
> */
>
> #include <stdlib.h>
>
> #define ARR_SIZE 1024*2048
> #define ALLOC_SIZE 1024
>
> int main ()
> {
> char* arr[ARR_SIZE];
> int i;
> for (i=0; i<4; ++i)
> fork();
> for(i=0; i < ARR_SIZE; ++i)
> {
> arr[i] = malloc(ALLOC_SIZE);
> }
> }

If I follow this right:
   The first process fires off 4 processes.
        Each of the second level processes fires off 3 processes
            Each of the third level process fires off 2 processes
                Each of the forth level processes fires of 1 process
total of 1 +(4 + 4 * (3 + 3 * (2 + 2 * (1)))) = 65 processes

Each process allocates 4MB array (COW) in the array arr, which is also
on the stack. (Each pointer occupies 4 bytes)

Each pointer points to an allocated 1K (unused).

65 processes * 4MB => 260MB (minimum). There is some additional since the
malloc is allocating chunks of less than a page size, so I believe that
the 1 million pointers will point to 1M * 1KB => 1024MB data. One process
should then have 1025MB of data

therefore 1025MB * 65 => 66625MB or 66.7 GB of memory (ram + swap).

It is actually larger since malloc must include the overhead pointers/counters
to be able to reclaim memory via "free()" (somewhere between 8 and 16 bytes
for the allocation header...) and the program text (one copy). It is this
overhead of "real" data that causes the physical allocation of the page (one
page is approximately 3.7 arrays of 1k).

Yup - you die unless you have a large memory system, or you get very
lucky. The OOM handler kills processes that are seen as being pigs, but
that assumes that only a few are actually doing so. klogd got killed
trying to allocate buffers/IO space to log the messages faster than the
IO system could write them out. The actual hogs were probably mostly swapped
out already, and idle - waiting for memory.

Nice program though.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:25 EST