I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel).
I do intend to bother SUSE, but I am hoping some kind kernel savant
can help me interpret these log messages and/or give me some
suggestions for how to proceed.
My system is a SunFire x4100 (x86_64) with 16G of RAM and 32G of swap
in a single partition. I have an application which consumes a lot of
memory, and after a few hours the oom-killer kills it.
This would not be surprising, except a) the machine still has 27G of
free swap at the time; and b) it happens even when I set
vm.overcommit_memory=2 (with overcommit_ratio=50). It is
time-consuming to reproduce, but consistent.
Below is the first batch of messages from the oom-killer. In this
case, overcommit_memory was set to 2, and the application
received no failures from malloc() or new.
9-10 more such batches appear in the log in rapid succession before
I get the "Kill process XXXX ... Killed process XXXX" pair of messages.
Here are my questions. I realize these log messages may be insufficient
to answer them, and I am ready to provide more information upon
request. (Unfortunately I cannot provide the application itself. But
I can perform tests and provide logs.)
1) Does this behavior definitely indicate a kernel bug? If not, what
could my application possibly be doing to cause it?
2) Is there any sysctl setting that might "avoid" this problem? Or a
way to change my application to avoid it?
3) Should I attempt to reproduce this problem on the stock kernel.org
kernel?