Re: Hang on x86-64, 2.6.9-rc3-bk4

From: Jeff Garzik
Date: Sat Oct 16 2004 - 18:45:52 EST


Andrew Morton wrote:
Jeff Garzik <jgarzik@xxxxxxxxx> wrote:

The only really notable changes in -bk3 -> -bk4 are the signal changes, something in mm/vmscan.c.



I'd be suspecting the vmscan.c change, but we allegedly fixed that later on.
Can you try reverting it? (Can't reproduce the problem here)


Verified -- reverting the vmscan.c changeset (attached) fixed my hang.

This hang is definitely present from -rc3-bk4 through -final, so a fix is not presented in mainline.

Jeff


# ChangeSet
# 2004/10/03 09:16:48-07:00 nickpiggin@xxxxxxxxxxxx
# [PATCH] vm: prevent kswapd pageout priority windup
#
# Now that we are correctly kicking off kswapd early (before the synch
# reclaim watermark), it is really doing asynchronous pageout. This has
# exposed a latent problem where allocators running at the same time will
# make kswapd think it is getting into trouble, and cause too much swapping
# and suboptimal behaviour.
#
# This patch changes the kswapd scanning algorithm to use the same metrics
# for measuring pageout success as the synchronous reclaim path - namely, how
# much work is required to free SWAP_CLUSTER_MAX pages.
#
# This should make things less fragile all round, and has the added benefit
# that kswapd will continue running so long as memory is low and it is
# managing to free pages, rather than going through the full priority loop,
# then giving up. Should result in much better behaviour all round,
# especially when there are concurrent allocators.
#
# akpm: the patch was confirmed to fix up the excessive swapout which Ray Bryant
# <raybry@xxxxxxx> has been reporting.
#
# Signed-off-by: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
# Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
# Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
#
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c 2004-10-16 19:42:30 -04:00
+++ b/mm/vmscan.c 2004-10-16 19:42:30 -04:00
@@ -968,12 +968,16 @@
static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
{
int to_free = nr_pages;
+ int all_zones_ok;
int priority;
int i;
- int total_scanned = 0, total_reclaimed = 0;
+ int total_scanned, total_reclaimed;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc;

+loop_again:
+ total_scanned = 0;
+ total_reclaimed = 0;
sc.gfp_mask = GFP_KERNEL;
sc.may_writepage = 0;
sc.nr_mapped = read_page_state(nr_mapped);
@@ -987,10 +991,11 @@
}

for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- int all_zones_ok = 1;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long lru_pages = 0;

+ all_zones_ok = 1;
+
if (nr_pages == 0) {
/*
* Scan in the highmem->dma direction for the highest
@@ -1072,6 +1077,15 @@
*/
if (total_scanned && priority < DEF_PRIORITY - 2)
blk_congestion_wait(WRITE, HZ/10);
+
+ /*
+ * We do this so kswapd doesn't build up large priorities for
+ * example when it is freeing in parallel with allocators. It
+ * matches the direct reclaim path behaviour in terms of impact
+ * on zone->*_priority.
+ */
+ if (total_reclaimed >= SWAP_CLUSTER_MAX)
+ break;
}
out:
for (i = 0; i < pgdat->nr_zones; i++) {
@@ -1079,6 +1093,9 @@

zone->prev_priority = zone->temp_priority;
}
+ if (!all_zones_ok)
+ goto loop_again;
+
return total_reclaimed;
}