Memory management

Rauli Ruohonen (raulir@fishy.pp.sci.fi)
Sun, 22 Jun 1997 20:38:52 +0300 (EET DST)


The version of named I use has a problem: when it's sent binary data, it
starts eating memory and then the whole system crashes. The behavior
hasn't changed from kernel 2.0.30 to pre-2.0.31-2, and none of the
patches for pre-2.0.31-1 cured it. (I can get a newer named, but I want to
fix the kernel, too!)

I tried the "20-pre-2-buffer-cache-fix.patch" patch from linuxhq, but it
didn't help, the behavior is the same as usual. First, named starts
eating memory, then when it reaches a "good" size the system starts
swapping and nothing else happens anymore. It seems to be paging the same
blocks in and out all the time, or something.. This is what I get with
shift-ScrollLock:

Mem_info:
Free pages: 160kB
(0*4kB 2*8kB 1*16kB 0*32kB 0*64kB 1*128kB = 160kB)
Swap cache: add 5412/5412, delete 230353/952, find 15310/4457
Free swap: 24900kB
5120 pages of RAM
40 free pages
442 reserved pages
38 pages shared
Buffer memory: 120kB
Buffer heads: 173
Buffer blocks: 120
CLEAN: 74 buffers, 57 used (last=57), 0 locked, 0 protected, 0 dirty
LOCKED: 22 buffers, 18 used (last=18), 0 locked, 0 protected, 0 dirty

I waited a little and pressed shift-ScrollLock again, and the only lines
changed were these:

Swap cache: add 11631/11631, delete 293871/1817, find 29782/9813
Free swap: 24908kB
24 pages shared
Buffer heads: 150
CLEAN: 76 buffers, 57 used (last=57), 0 locked, 0 protected, 0 dirty

I didn't write the rest down, but I did look at many of these, they
didn't change much, just bounced around the same values.

Other patches I've tried, and which didn't help: "20-mem-fix-pre2.patch" and
"20-low-mem-cachebug-fix-2.patch", are there any other patches to try?

Btw, I'm trying to make a patch to handle out of memory conditions
better instead of just killing processes semi-randomly, but I have a few
problems with it:

There's a strange SIGBUS code in mm/memory.c:

Function do_no_page():
...
page = vma->vm_ops->nopage(vma, address,
(vma->vm_flags & VM_SHARED)?0:write_access);
if (!page)
goto sigbus;
...
unsigned long page = __get_free_page(GFP_KERNEL);
if (!page)
goto sigbus;
...
sigbus:
force_sig(SIGBUS, current);

do_no_page() is the function which really allocates the pages when you
touch them. I think that this should not send SIGBUS, but should call oom()..
That's because it's out of memory, and unless I'm mistaken the BUS signal is
not an "out-of-memory" signal? Btw, what could be? It would be nice to
have a you're-dead-if-you-don't-free-memory-signal, normally it would do
nothing, but app could handle it and try to free memory.

I have also a few questions:

1) is there any schedule_to(task) function? I haven't found any, so I
do a kluge: "task->policy=SCHED_FIFO; task->rt_priority=100; schedule();",
but.. I just want to switch to it so it can handle the SIGKILL I sent it,
and this kluge is, well, a kluge.
2) Is it correct to do the same thing as oom() does for the oom_kill()
(the function which picks the process with best "score" and kills that)?
Can I just unmask SIGKILL and send it to the (non-current) process
and nothing bad happens? I do this currently, but I'd like to confirm
whether it's safe or not..
3) is there way to calculate how much real, physical memory a process uses,
instead of the virtual memory used? I don't want to calculate those
mmap():ed /dev/kmem:s (X, for example) but those areas it has obtained
through mmap("/dev/zero") and such. Now I use "task->mm->total_vm", but
I don't think it's correct.
4) what causes the "can't mmap /lib/libc.so" and such messages?
This is one thing I really want to avoid! The system is *USELESS*
if you can't do "ls", "ps" and such anymore because the libs can't be
mmap():ed!!! [in out of memory condition, oom_kill() should be called]
Is it vm_enough_memory()? How could the libs be "detected"?
malloc() should still return NULL in some cases, so just making
vm_enough_memory() always return 1 isn't a solution..

It would also be nice to have a "struct user" linked list in the kernel,
like we have the process list. That structure could hold the limits for a
specific user, plus the memory/cpu/whatever usages for every user. Then
we could give a user a hard memory usage limit of 8 MB, for instance.
He could start two 4 MB processes, or one 8 MB process, but not two 8 MB
processes. The structure could also contain "user-scores" (for
oom_kill()), so "www" user's processes wouldn't be killed when memory
runs out. *That* would be fine-grained :)

The way to get those scores and limits, would be to ask kerneld when we
get a setuid() call or similar. That's the place to ask them - oom_kill()
can't do that, because we're desparately trying to free memory, and
kerneld might need memory to mmap libc, or read a config file, or..

Read this before flaming: The scoring system is fully configurable, as we
have the sources of the kernel. It can be made configurable from the user
space. I'm just trying to make it possible to *have* a scoring system
instead of semi-random-kill system. The scoring policy itself can be changed.
The old behavior can be emulated by the following code:

int oom_kill(struct task_struct *cause,int try)
{
oom(cause);
return 0;
}

So it can be a compile time option with only one #ifdef-#else-#endif.
I know that out of memory conditions shouldn't happen, but when they do,
the kernel should be prepared to deal with them. If I'm not be skillful
enough to make this change, then somebody else should do it instead.

-- 
Who is General Failure and why is he reading my hard disk?
                -- Felix von Leitner