Re: [PATCH] vm: enhance __alloc_pages to prioritize pagecacheeviction when pressed for memory

From: Andrew Morton
Date: Fri Dec 09 2005 - 19:28:03 EST


Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
>
> Hey all-
> I was recently shown this issue, wherein, if the kernel was kept full of
> pagecache via applications that were constantly writing large amounts of data to
> disk, the box could find itself in a position where the vm, in __alloc_pages
> would invoke the oom killer repetatively within try_to_free_pages, until such
> time as the box had no candidate processes left to kill, at which point it would
> panic.

That's pretty bad. Are you able to provide a description which would permit
others to reproduce this?

> /*
> + * Writeback nr_pages from pagecache to disk synchronously
> + * blocks until the writeback is complete
> + */
> +void clean_pagecache(long nr_pages)
> +{
> + struct writeback_control wbc = {
> + .bdi = NULL,
> + .sync_mode = WB_SYNC_ALL,
> + .older_than_this = NULL,
> + .nr_to_write = nr_pages,
> + .nonblocking = 0,
> + };
> +
> + writeback_inodes(&wbc);
> +}

Interesting.

> +/*
> * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back
> * the whole world. Returns 0 if a pdflush thread was dispatched. Returns
> * -1 if all pdflush threads were busy.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -949,6 +949,16 @@ rebalance:
> reclaim_state.reclaimed_slab = 0;
> p->reclaim_state = &reclaim_state;
>
> + /*
> + * We're pinched for memory, so before we try to reclaim some
> + * pages synchronously, lets try to force some more pages out
> + * of pagecache, to raise our chances of this succeding.
> + * specifically, lets write out the number of pages that this
> + * allocation is requesting, in the hopes that they will be
> + * contiguous
> + */
> + clean_pagecache(1<<order);
> +
> did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);

I suspect that we shuld be passing more than (1<<order) into
clean_pagecache() - if we're going to do this sort of writeback then we
might as well do a decent amount. Maybe something like (number of pages on
the eligible LRUs * proportion of dirty memory) or something. But then,
page reclaim does writeback off the LRU, so none of this should be
needed... Need to work out why it broke.

And we should not be calling into filesystem writeback unless the caller
specified __GFP_FS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/