Re: new swap cache regime

Stephen C. Tweedie (sct@redhat.com)
Mon, 28 Sep 1998 11:52:24 +0100


Hi,

On Mon, 28 Sep 1998 00:04:03 +0200 (CEST), Andrea Arcangeli
<andrea@e-mind.com> said:

> Once the basic patch will be declared stable I am _sure_ we' ll need
> some other code to be rasonably efficient before 2.2 because right now
> try_to_free_page() is not enough clever to do the right thing with the
> basic patch applyed.

> With the basic patch it can happens that a machine will go out of
> memory very fast. Then try_to_free_page() will start swapping out and
> the machine will go out of swap (because every swap cache garbage page
> will waste a page of memory and an entry in the swap cache). Only
> shrink_mmap() occasionally will do the right thing removing the
> garbage from the swap cache (and so freeing a swap entry and a page of
> memory at the same time). Right now shrink_mmap() could also remove a
> caching entry (the one with the swap entry shared) instead of one that
> is not referenced by anybody (a garbage one).

This is precisely the last lingering concern with the patch. However,
as you mentioned, I'm not convinced it's a massive problem, because we
do try to shrink the page cache before swapping, and shrink_mmap will
eliminate the redundant swap references for us.

So, far better to apply this patch as is than to leave the current
performance problem unsolved.

>> I liked your previous patch, and it seems Stephen agreed with it too once
>> it had some things changed - if only because then it was very close to
>> what he had done. The new patch looks to have fixed Stephens worries, and
>> I'd like to see it without the extra swap_free() code - just to test it
>> out.

Yep.

If it turns out we need to do something about this, then there are two
fast ways I can see. The first is to add a per-swap-entry bitmap
indicating whether pages are cached; that gives us a fast, constant-time
check to see whether a zap_pte needs to kill the swap cache. The
alternative is a quick and easy change to free_page_and_swap_cache:

----------------------------------------------------------------
--- mm/swap_state.c.~1~ Mon Sep 21 15:52:34 1998
+++ mm/swap_state.c Mon Sep 28 11:45:44 1998
@@ -208,6 +208,18 @@
delete_from_swap_cache(page);
}

+ /* Special case: if we have still got other references to this
+ * page, but all such references are to the swap entry rather
+ * than to the physical page, then we are only keeping this page
+ * for caching purposes. If the remaining process references
+ * are killed, we will be left with an orphaned swap entry from
+ * the cache. Make sure that such pages are easily cleaned out
+ * of the page cache. */
+ if (atomic_read(&page->count) == 1) {
+ clear_bit(PG_referenced, &page->flags);
+ page->age = 0;
+ }
+
free_page(addr);
}

----------------------------------------------------------------

This doesn't eliminate the orphaned swap entries, but does make sure
they are reaped rapidly by shrink_mmap once we get short of memory again
(and, of course, until we are short of physical memory we don't need to
worry about extra swap being used).

The "clean" solution is to prevent the swap cache from acting as a
reference to the swap entry underneath, but that becomes nasty. The
only way we can make that work is to change the swap entry accounting so
that read-only COW mappings of swap pages count both against the swap
entry and the physical page. In that case, the last exit() of a cached
swap entry will instantly free the swap entry (but will not free the
swap cache: we still need shrink_mmap for that). It may be that the
per-swap-entry cache flag is the way to go, for the reason that it tells
us precisely when to bother cleaning up both the page cache and the swap
entry reference in the case of final exit of a swapped page.

For now, let's just go with what's in .123 and deal with complaints as
they come in. I can't see that being anything but an improvement,
except possibly in cases where we have less swap than memory and a high
memory load.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/