Re: Fix for wrong OOM killer trigger?

From: Roland Bless
Date: Mon Sep 22 2003 - 05:13:57 EST


Hi Andrea,

On Sat, 20 Sep 2003 16:34:10 +0200 Andrea Arcangeli <andrea@xxxxxxx> wrote:

> > The real cause, however, seems to be that the filesystem cache
> > memory is not properly re-used when it should, or, that it tries to
> > allocate a huge amount memory. The programs themselves do not
> > allocate much memory! It must be the system, because I also
> > ran programs with memory restrictions by ulimit. The programs
> > are definitely not allocating the memory, and, 4GB RAM are really
> > enough for a simple file server like ours.
>
> that might be an accounting error in the oom killing then (even that
> should be corrected in my tree or in the stock 8.1 SuSE kernel).
>
> the reason normally oom accounting errors never showup, is that when the
> amount of free-swap is >0, the oom-killer is never invoked (that's a
> magic that probably avoids those situations to normally arise in the
> stock kernel).
>
> so maybe you had no swap, if you had no swap that would explain it.

That's clear then, however, some kernel process/procedure must have tried
to allocate a huge block of memory.

> and of course if you have 4G of ram and you know you've more than enough
> ram then you'd be right using 0 swap (just the stock kernel oom killer
> may malfunction, but that's not going to happen with the kernels I
> suggested you to try, they'll be fine with 0 swap)

> hope this helps ;)

The suggestion from Marc-Christian Petersen <m.c.p@xxxxxxxxxxxxxxx>,
namely using v2.4.23-pre5, worked for me. I was not sure before,
because I was not able to guess from the Changelog whether there
was a fix for the particular bug. My suggestion is that the log entry
below describes the bug fix for it:

Summary of changes from v2.4.22 to v2.4.23-pre1
============================================
...
Marc-Christian Petersen:
o Cleanup kmem_cache_reap()
or was it related to this one:
o Avoid potentially leaking pagetables into the per-cpu queues

I hope that it was also fixed in 2.6, or is there a different mechanism
used?

Best regards,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/