Re: VM Problems in 2.6.7 (Too active OOM Killer)

From: William Lee Irwin III
Date: Wed Jul 14 2004 - 21:35:30 EST


On Wed, 2004-07-14 at 18:54, William Lee Irwin III wrote:
>> The only method the kernel now has to relocate userspace memory is IO.
>> When mlocked, or if anonymous when there's no swap, it's pinned.

On Wed, Jul 14, 2004 at 07:13:23PM -0700, Peter Zaitsev wrote:
> OK. So it is practically technical difficulty rather than fundamental
> reason ? Why "move to other zone" way is not implemented ? It normally
> should be cheaper than IO ?

There is no technical difficulty, however, do notice there are other forms
of placement-restricted pagecache, i.e. blockdev pagecache, ramdisks, etc.


On Wed, 2004-07-14 at 18:54, William Lee Irwin III wrote:
>> Userspace allocations can also trigger OOM, it's merely that in this
>> case only allocations restricted to ZONE_NORMAL or below, e.g. kernel
>> allocations, are affected. Your memory pressure is restricted to one zone.

On Wed, Jul 14, 2004 at 07:13:23PM -0700, Peter Zaitsev wrote:
> Right. After being explained what without swap you have all pages pinned
> it makes sense. On other hand why user Allocation will trigger OOM if
> there are pages in other zone which still can be used ? Or are there any
> restriction on this ?

Allocations can be requested to come from restricted physical areas.
In this kind of situation, the OOM comes from exhaustion of a physical
area smaller than all of RAM, i.e. ZONE_NORMAL or ZONE_DMA.

The OOM decision-making is noteworthy:
do_retry = 0;
if (!(gfp_mask & __GFP_NORETRY)) {
if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
do_retry = 1;
if (gfp_mask & __GFP_NOFAIL)
do_retry = 1;
}
if (do_retry) {
blk_congestion_wait(WRITE, HZ/50);
goto rebalance;
}

At the rebalance label, failure will only be delivered when the
check if (current->flags & (PF_MEMALLOC|PF_MEMDIE)), otherwise,
__alloc_pages() retries indefinitely and ignores signals.

Furthermore, notice the OOM killer will trip if out_of_memory() is
called more than 10 times in one second, which is plausible for a
single process to do, as it only sleeps for HZ/50 jiffies. More
interestingly, out_of_memory() is never called unless __GFP_FS is set.


On Wed, 2004-07-14 at 18:54, William Lee Irwin III wrote:
>> In order to relocate a userspace page, the kernel performs IO to write
>> the page to some backing store, then lazily faults it back in later. When
>> the userspace page lacks a backing store, e.g. anonymous pages on
>> swapless systems, Linux does not now understand how to relocate them.

On Wed, Jul 14, 2004 at 07:13:23PM -0700, Peter Zaitsev wrote:
> Can't it just be just (theoretically) moved to other zone with
> appropriate system tables modifications ?
> Well anyway it is good to hear "pinned anonymous" is only issue on
> swapless systems. Together with the fact what 2.6 VM does not seems to
> swap without a good reason as 2.4 one did, I perhaps can just have swap
> file enabled.

There is no technical (or even practical) obstacle to implementing
in-core page relocation, only a social one: kernel politics. I would not
be surprised if hotplug memory patches already had code usable for this.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/