Re: new swap cache regime

Andrea Arcangeli (andrea@e-mind.com)
Mon, 28 Sep 1998 00:04:03 +0200 (CEST)


On Sun, 27 Sep 1998, Linus Torvalds wrote:

>The real reason I don't like it is that right now I want something that
>works, and that everybody agrees on is the _minimal_ patch possible. I
>want something simple that is obviously correct (and right now I don't
>even care whether swapoff() works or not, actually).

swapoff() will work fine.

>We can tweak later, I'd really like to have a patch that everybody can
>live with for now, and then we can see what the problems are. I'd prefer
>not to have extra code until it's shown to be necessary, and I also want
>to have the basics tested well before we start to be clever.

Right.

Once the basic patch will be declared stable I am _sure_ we' ll need some
other code to be rasonably efficient before 2.2 because right now
try_to_free_page() is not enough clever to do the right thing with the
basic patch applyed.

With the basic patch it can happens that a machine will go out of memory
very fast. Then try_to_free_page() will start swapping out and the machine
will go out of swap (because every swap cache garbage page will waste a
page of memory and an entry in the swap cache). Only shrink_mmap()
occasionally will do the right thing removing the garbage from the swap
cache (and so freeing a swap entry and a page of memory at the same time).
Right now shrink_mmap() could also remove a caching entry (the one with
the swap entry shared) instead of one that is not referenced by anybody
(a garbage one).

I think this behavior should not cause major problems (because first or
before shrink_mmap() should remove the garbage from the swap cache) but
it' s sure something we won' t to live with it.

My last patch (the one that changed swap_free()) fixed the problem fine
without using alien code.

>I liked your previous patch, and it seems Stephen agreed with it too once
>it had some things changed - if only because then it was very close to
>what he had done. The new patch looks to have fixed Stephens worries, and
>I'd like to see it without the extra swap_free() code - just to test it
>out.

Ok. Here it is (fixed ;-):

diff -urN /home/andrea/devel/kernel-tree/linux-2.1.122/include/linux/swap.h linux/include/linux/swap.h
--- /home/andrea/devel/kernel-tree/linux-2.1.122/include/linux/swap.h Sat Sep 5 14:17:56 1998
+++ linux/include/linux/swap.h Sun Sep 27 23:26:48 1998
@@ -89,6 +89,7 @@
extern int swap_check_entry(unsigned long);
extern struct page * read_swap_cache_async(unsigned long, int);
#define read_swap_cache(entry) read_swap_cache_async(entry, 1);
+extern int FASTCALL(swap_count(unsigned long));
/*
* Make these inline later once they are working properly.
*/
@@ -146,14 +147,20 @@
*/
static inline int is_page_shared(struct page *page)
{
- int count = atomic_read(&page->count);
+ unsigned int count;
if (PageReserved(page))
return 1;
- if (page->inode == &swapper_inode)
- count--;
+ count = atomic_read(&page->count);
+ if (PageSwapCache(page))
+ {
+ /* PARANOID */
+ if (page->inode != &swapper_inode)
+ panic("swap cache page has wrong inode\n");
+ count += swap_count(page->offset) - 2;
+ }
if (PageFreeAfter(page))
count--;
- return (count > 1);
+ return count > 1;
}

#endif /* __KERNEL__*/
diff -urN /home/andrea/devel/kernel-tree/linux-2.1.122/mm/page_alloc.c linux/mm/page_alloc.c
--- /home/andrea/devel/kernel-tree/linux-2.1.122/mm/page_alloc.c Thu Sep 10 23:56:48 1998
+++ linux/mm/page_alloc.c Sun Sep 27 23:25:52 1998
@@ -163,9 +163,11 @@
free_pages_ok(page->map_nr, 0);
return;
}
+#if 0
if (PageSwapCache(page) && atomic_read(&page->count) == 1)
printk(KERN_WARNING "VM: Releasing swap cache page at %p",
__builtin_return_address(0));
+#endif
}

void free_pages(unsigned long addr, unsigned long order)
@@ -182,10 +184,12 @@
free_pages_ok(map_nr, order);
return;
}
+#if 0
if (PageSwapCache(map) && atomic_read(&map->count) == 1)
printk(KERN_WARNING
"VM: Releasing swap cache pages at %p",
__builtin_return_address(0));
+#endif
}
}

diff -urN /home/andrea/devel/kernel-tree/linux-2.1.122/mm/swap_state.c linux/mm/swap_state.c
--- /home/andrea/devel/kernel-tree/linux-2.1.122/mm/swap_state.c Thu Sep 10 23:56:48 1998
+++ linux/mm/swap_state.c Sun Sep 27 23:25:52 1998
@@ -143,6 +143,50 @@
goto out;
}

+int swap_count(unsigned long entry)
+{
+ struct swap_info_struct * p;
+ unsigned long offset, type;
+ int retval = 0;
+
+ if (!entry)
+ goto bad_entry;
+ type = SWP_TYPE(entry);
+ if (type & SHM_SWP_TYPE)
+ goto out;
+ if (type >= nr_swapfiles)
+ goto bad_file;
+ p = type + swap_info;
+ offset = SWP_OFFSET(entry);
+ if (offset >= p->max)
+ goto bad_offset;
+ if (!p->swap_map[offset])
+ goto bad_unused;
+ retval = p->swap_map[offset];
+#ifdef DEBUG_SWAP
+ printk("DebugVM: swap_count(entry %08lx, count %d)\n",
+ entry, retval);
+#endif
+out:
+ return retval;
+
+bad_entry:
+ printk(KERN_ERR "swap_count: null entry!\n");
+ goto out;
+bad_file:
+ printk(KERN_ERR
+ "swap_count: entry %08lx, nonexistent swap file!\n", entry);
+ goto out;
+bad_offset:
+ printk(KERN_ERR
+ "swap_count: entry %08lx, offset exceeds max!\n", entry);
+ goto out;
+bad_unused:
+ printk(KERN_ERR
+ "swap_count at %8p: entry %08lx, unused page!\n",
+ __builtin_return_address(0), entry);
+ goto out;
+}

static inline void remove_from_swap_cache(struct page *page)
{
@@ -155,6 +199,7 @@
printk ("VM: Removing swap cache page with wrong inode hash "
"on page %08lx\n", page_address(page));
}
+#if 0
/*
* This is a legal case, but warn about it.
*/
@@ -163,6 +208,7 @@
"VM: Removing page cache on unshared page %08lx\n",
page_address(page));
}
+#endif

#ifdef DEBUG_SWAP
printk("DebugVM: remove_from_swap_cache(%08lx count %d)\n",

Andrea[s] Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/