Re: upcoming kerneloops.org item: get_page_from_freelist

From: David Rientjes
Date: Thu Jun 25 2009 - 16:19:26 EST


On Thu, 25 Jun 2009, Theodore Tso wrote:

> On Thu, Jun 25, 2009 at 03:38:06PM -0400, Theodore Tso wrote:
> > Hmm, is there a reason to avoid using GFP_ATOMIC on the first
> > allocation, and only adding GFP_ATOMIC after the first failure?
>
> Never mind, stupid question; I hit the send button before thinking
> about this enough. Obviously we should try without GFP_ATOMIC so the
> allocator can try to release some memory.

The allocator can't actually release much memory itself, it must rely on
pdflush to do writeback and the slab shrinkers are mostly all no-ops for
~__GFP_FS. The success of pdflush's freeing will depend on the caller's
context.

> So maybe the answer for
> filesystem code where the alternative to allocator failure is
> remounting the root filesystem read-only or panic(), should be:
>
> 1) Try to do the allocation GFP_NOFS.
>
> 2) Then try GFP_ATOMIC
>
> 3) Then retry the allocator with GFP_NOFS in a loop (possibly with a
> timeout than then panic's the system and allows the system to reboot,
> although arguably a watchdot timer should really perform that
> function).
>

This is similar to how __getblk() will repeatedly loop until it gets
sufficient memory to create buffers for the block page, which also relies
heavily on pdflush. If the GFP_ATOMIC allocation failed, then it's
unlikely that the subsequent GFP_NOFS allocation will succeed any time
soon without the oom killer, which we're not allowed to call, so it would
probably be better to loop in step #2 with congestion_wait().

> Obviously if we can rework the filesystem code to avoid this as much
> as possible, this would be desirable, but if there are some cases left
> over where we really have no choice, that's probably what we should
> do.
>

Isn't there also a problem in jbd2_journal_write_metadata_buffer(),
though?

char *tmp;

jbd_unlock_bh_state(bh_in);
tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
jbd2_free(tmp, bh_in->b_size);
goto repeat;
}

jh_in->b_frozen_data = tmp;
mapped_data = kmap_atomic(new_page, KM_USER0);
memcpy(tmp, mapped_data + new_offset, jh2bh(jh_in)->b_size);

jbd2_alloc() is just a wrapper to __get_free_pages() and if it fails, it
appears as though the memcpy() would cause a NULL pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/