Re: Memory overcommit

From: Vedran FuraÄ
Date: Wed Oct 28 2009 - 09:28:28 EST


David Rientjes wrote:

> On Wed, 28 Oct 2009, Vedran Furac wrote:
>
>>> This is wrong; it doesn't "emulate oom" since oom_kill_process() always
>>> kills a child of the selected process instead if they do not share the
>>> same memory. The chosen task in that case is untouched.
>> OK, I stand corrected then. Thanks! But, while testing this I lost X
>> once again and "test" survived for some time (check the timestamps):
>>
>> http://pastebin.com/d5c9d026e
>>
>> - It started by killing gkrellm(!!!)
>> - Then I lost X (kdeinit4 I guess)
>> - Then 103 seconds after the killing started, it killed "test" - the
>> real culprit.
>>
>> I mean... how?!
>>
>
> Here are the five oom kills that occurred in your log, and notice that the
> first four times it kills a child and not the actual task as I explained:

Yes, but four times wrong.

> Those are practically happening simultaneously with very little memory
> being available between each oom kill. Only later is "test" killed:
>
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
>
> Notice how the badness score is less than 1/4th of the others. So while
> you may find it to be hogging a lot of memory, there were others that
> consumed much more.
^^^^^^^^^^^^^^^^^^^^^

This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
(ignoring cache). Culprit then allocates all free memory (2GB). That
means it is using *more* than all other processes *together*. There
cannot be any other "that consumed much more".

> You can get a more detailed understanding of this by doing
>
> echo 1 > /proc/sys/vm/oom_dump_tasks
>
> before trying your testcase; it will show various information like the
> total_vm

Looking at total_vm (VIRT in top/vsize in ps?) is completely wrong. If I
sum up those numbers for every process running I would get:

%ps -eo pid,vsize,command|awk '{ SUM += $2} END {print SUM/1024/1024}'
14.7935

14GB. And I only have 3GB. I usually use exmap to get realistic numbers:

http://www.berthels.co.uk/exmap/doc.html

> and oom_adj value for each task at the time of oom (and the
> actual badness score is exported per-task via /proc/pid/oom_score in
> real-time). This will also include the rss and show what the end result
> would be in using that value as part of the heuristic on this particular
> workload compared to the current implementation.

Thanks, I'll try that... but I guess that using rss would yield better
results.


Regards,

Vedran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/