Re: Vanilla-Kernel 3 - page allocation failure

From: Philipp Herz - Profihost AG
Date: Wed Oct 19 2011 - 02:45:06 EST


Hello Cascardo,

> echo m> /proc/sysrq-trigger
Thanks,
I have pasted another Call Trace including memory stats at

* http://pastebin.com/vjLHuqtk

Not sure if memory stats are close enough to the call trace event.

If not, do we have to recompile the kernel to get call traces and memory stats at the same time?

> Do your workload works better on a previous version? I had problems
> using something like 2.6.32.
Yes,
kernel version was 2.6.32.40 before and we never had those messages appearing.

Regards,
Philipp

Am 18.10.2011 16:35, schrieb Thadeu Lima de Souza Cascardo:
On Tue, Oct 18, 2011 at 03:24:44PM +0200, Philipp Herz - Profihost AG wrote:
Hello Cascardo

Usually, after the stack dump, there is some
statistics about memory.
Yes, i have seen this in other posts as well.

I have seen that these may be suppressed
if you have a NUMA system with lots of nodes.
Yes, in our case it seems to be suppressed.

Check for NODE_SHIFT in your
config. If it's greater than 8, that output may have been suppressed.
CONFIG_NODES_SHIFT=10 will be the answer.

Is there any way to get those stats without recompiling the kernel?

But you may have just ignored the statistics because of the
stack dump.
No, i was also wondering why other do have these ;-)

Regards,
Philipp


echo m> /proc/sysrq-trigger

will show you that same output, but not at the time the memory failure
happens. It may still show you what is the condition of memory on your
nodes.

I am not that much versed in the VM. It just happens that I had very
similar issues lately and was trying to undertand it a little more. I
still have to solve these issues myself.

In my case, the workload is IO bound on extX filesystems and I see that
other systems have these failures due to this memory pressure. Usually,
after stopping the workload and unmounting the filesystems, I get most
of the memory in the system freed.

Most of the failures are from GFP_ATOMIC allocations, because those
won't reclaim memory, but they won't allocate if there is only freed
memory below the threshold. Setting this threshold to a lower value
like it was suggested (min_free_kbytes) would have helped, but, then,
this allows whatever is putting pressure on your memory to also allocate
below the threshold and you end up in the same situation (or a worse
one).

Do your workload works better on a previous version? I had problems
using something like 2.6.32.

Regards,
Cascardo.

Am 18.10.2011 14:38, schrieb Thadeu Lima de Souza Cascardo:
On Tue, Oct 18, 2011 at 02:07:38PM +0200, Philipp Herz - Profihost AG wrote:
Hello Cascardo,

thanks for your detailed answer!

I have uploaded two call traces to pastebin for further investigation.

Maybe this can help you.

* http://pastebin.com/Psg2dGYC (kworker)
* http://pastebin.com/pPFjZqxL (php5)

Regards,
Philipp


Hello, Philipp.

That only tells us that you have a TCP workload in your system. This is
the subsystem that is trying to allocate memory. However, we do not know
why there is failure. Usually, after the stack dump, there is some
statistics about memory. I have seen that these may be suppressed if you
have a NUMA system with lots of nodes. Check for NODE_SHIFT in your
config. If it's greater than 8, that output may have been suppressed.
But you may have just ignored the statistics because of the stack dump.

Regards,
Cascardo.


Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
After updating kernel (x86_64) to stable version 3 there are a few
messages appearing in the kernel log such as

kworker/0:1: page allocation failure: order:1, mode:0x20
mysql: page allocation failure: order:1, mode:0x20
php5: page allocation failure: order:1, mode:0x20

Searching the net showed that these messages are known to occur since 2004.

Some people were able to get rid of them by setting
/proc/sys/vm/min_free_kbytes to a high enough value. This does not
help in our case.


Is there a kernel comand line argument to avoid these messages?

As of mm/page_alloc.c these messages are marked to be only warning
messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
in function warn_alloc_failed.

How does this mask get set? Is it set by the "external" process
knocking at the memory manager?


Hello, Philipp.

This happens when kernel tries to allocate memory, sometimes in response
to some request by the user space, but also in other contexts. For
example, an interrupt by a network driver may try to allocate memory. In
this context, it will use GFP_ATOMIC as a mask, for example. The most
usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.

What is the magic behind the 'order' and 'mode'?


The order is the binary log of the number of pages requested. So, order 1
allocations are 2 pages, order 4 would be 16 pages, for example.

The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
allocation cannot do IO or access the filesystem. Also, it cannot wait
for reclaim memory from cache.

This warning is usually followed by some statistics about memory use
in your system. Please post it to give more information about this
situation.

I have watched some of this happen when lots of cache is used by some
filesystems. Perhaps, some tweaking of the vm sysctl options may help,
but I can point any magic tweaking right now.

Regards,
Cascardo.

I'm not a subscriber, so please CC me a copy of messages related to
the subject. I'm not sure if I can help much by looking at the
inside of the kernel, but I will try my best to answer any questions
concerning this issue.

Best regards, Philipp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/