[PATCH] mm: Use up free swap space before reaching OOM kill

From: Minchan Kim
Date: Sun Jan 20 2013 - 20:43:43 EST


Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.
He said there was no problem when he disabled laptop_mode.
The problem when I investigate problem is following as.

Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.

1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
pageout because sc->may_writepage is 0 so the page is rotated back into
inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
inactive anon lru list is full of dirty pages by 3 so it just returns
without any reclaim progress.
6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
Because sc->nr_scanned is increased by shrink_page_list but we don't call
shrink_page_list in 5 due to short of isolated pages.

Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's
isolatation in shrink_inactive_list was successful and tried to call
shrink_page_list to pageout them but it still ends up failed to page out
by may_writepage. But important point is that sc->nr_scanned was increased
although we couldn't swap out them so do_try_to_free_pages could set
may_writepages.

Since [1] was introduced, it's not a good idea any more to depends on
only the number of scanned pages for setting may_writepage. So this patch
adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
which is used to show the significant memory pressure in VM so it's good
fit for our purpose which would be better to lose power saving or clickety
rather than OOM killing.

[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

Reported-by: Luigi Semenzato <semenzato@xxxxxxxxxx>
Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
mm/vmscan.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 740bad9..d333df0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2204,6 +2204,13 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
goto out;

/*
+ * If we're getting trouble reclaiming, start doing
+ * writepage even in laptop mode.
+ */
+ if (sc->priority < DEF_PRIORITY - 2)
+ sc->may_writepage = 1;
+
+ /*
* Try to write back as many pages as we just scanned. This
* tends to cause slow streaming writers to write data to the
* disk smoothly, at the dirtying rate, which is nice. But
@@ -2774,12 +2781,10 @@ loop_again:
}

/*
- * If we've done a decent amount of scanning and
- * the reclaim ratio is low, start doing writepage
- * even in laptop mode
+ * If we're getting trouble reclaiming, start doing
+ * writepage even in laptop mode.
*/
- if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
- total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
+ if (sc.priority < DEF_PRIORITY - 2)
sc.may_writepage = 1;

if (zone->all_unreclaimable) {
--
1.7.9.5

>-
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/