RFC: preliminary patch for memory defragmentation

Bill Hawes (whawes@star.net)
Thu, 16 Jul 1998 18:10:48 -0400


This is a multi-part message in MIME format.
--------------9ECDEFDD16AE04DFB2AF5000
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I've been experimenting with a simple but effective approach to memory
defragmentation, and have attached a patch for comment and suggestions.

The defragger code is pretty straightforward. The need_defrag() routine
checks whether the memory lists are low by comparing the current lists
against a goal vector, and gives a credit for excess memory to carry
down to the next list.

When defragging is necessary, anneal_mem() allocates a page and builds a
bitmap of the free pages in the lists to be defragged. It then builds a
list of candidate pages to be freed by looking for page groups that need
only one additional page to be coalesced. It attempts to free each page
in the list, updating the bitmap on success and continuing to the next
memory order if the original goal is reached. The process is repeated as
long as progress is made and defragging is still required.

The current patch is rather limited in the pages that can be freed --
only inode and buffer cache -- but the overall approach can be extended
to include unmapping pages as well. Since it builds a list of candidate
pages to be freed, it nwouldn't be too compute-intensive to search the
vma lists for page mappings, checking for all pages at once. But this
may not even be necessary -- the existing kswap code can balance memory
between swapping and caches, so it's no big deal if some extra inode or
buffer cache gets used for defragging.

Another limitation is that only one page is allocated for the free-pages
bitmap, limiting the memory to 128M. There are a couple of easy ways
around this, such as using vmalloc(), or making several defrag passes of
128M each. (Of course, machines with > 128M probably don't need a lot of
defragging assistance anyway.)

Even with the above limitations, the current patch is stable and seems
to work pretty well. In normal use it probably won't be called very
often, but under heavy swapping and forking loads it gets used
periodically. I've left a few debugging messages in place to monitor the
action.

My hope is that once the kernel has effective memory defragger, the
kswapd code can go back to being just a single-page reclaimer, which is
does quite effectively.

Comments and testing welcome ...

Regards,
Bill
--------------9ECDEFDD16AE04DFB2AF5000
Content-Type: text/plain; charset=us-ascii; name="mm_defrag108-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="mm_defrag108-patch"

--- linux-2.1.108/include/linux/mm.h.old Wed Jul 15 09:47:03 1998
+++ linux-2.1.108/include/linux/mm.h Thu Jul 16 15:19:00 1998
@@ -251,7 +251,7 @@
return page;
}

-/* memory.c & swap.c*/
+/* page_alloc.c */

/*
* This traverses "nr" memory size lists,
@@ -270,13 +270,17 @@
#define kswapd_continue() (!free_memory_available(3))
#define kswapd_wakeup() (!free_memory_available(0))

+extern int need_defrag(int *);
+extern void anneal_mem(void);
+
#define free_page(addr) free_pages((addr),0)
extern void FASTCALL(free_pages(unsigned long addr, unsigned long order));
extern void FASTCALL(__free_page(struct page *));
-
extern void show_free_areas(void);
-extern unsigned long put_dirty_page(struct task_struct * tsk,unsigned long page,
- unsigned long address);
+
+/* memory.c & swap.c*/
+extern unsigned long put_dirty_page(struct task_struct *, unsigned long,
+ unsigned long);

extern void free_page_tables(struct mm_struct * mm);
extern void clear_page_tables(struct task_struct * tsk);
--- linux-2.1.108/mm/page_alloc.c.old Fri Jul 3 10:32:34 1998
+++ linux-2.1.108/mm/page_alloc.c Thu Jul 16 15:30:18 1998
@@ -419,3 +419,278 @@
set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))));
return;
}
+
+/* ==================== Memory Defragmentation ======================== */
+
+#define DEFRAG_MAX_ORDER 3
+#define DEFRAG_PAGES 32
+
+/* These are the goals for the defragmentation code */
+int free_mem_goal[NR_MEM_LISTS] = {0, 8, 4, 2, 0, };
+
+/*
+ * Check whether we need to run the memory defragger.
+ * We only care about the order 1, 2, and 3 lists, but
+ * allow a credit to trickle down from the higher orders.
+ */
+int need_defrag(int *count)
+{
+ int order = NR_MEM_LISTS, credit = 0, defrag = 0;
+ unsigned long flags;
+
+ spin_lock_irqsave(&page_alloc_lock, flags);
+ while (--order) {
+ struct free_area_struct *list = free_area + order;
+ struct page *next = list->next;
+
+ credit <<= 1;
+ /*
+ * Count the nodes on the memory list, but
+ * stop after one node on the higher orders.
+ */
+ while (next != memory_head(list)) {
+ credit++;
+ if (order > DEFRAG_MAX_ORDER)
+ break;
+ next = next->next;
+ }
+ /* Save the results if needed */
+ if (count)
+ count[order] = credit;
+ /*
+ * Check whether we've met the goal.
+ */
+ if (credit >= free_mem_goal[order]) {
+ credit -= free_mem_goal[order];
+ } else {
+ credit = 0;
+ defrag = 1;
+ }
+ }
+ spin_unlock_irqrestore(&page_alloc_lock, flags);
+ return defrag;
+}
+
+/*
+ * Builds a bitmap of the free pages in the memory lists
+ * up to and including the specified order.
+ */
+static void build_free_map(char * map, int size, int max_order)
+{
+ unsigned long flags;
+ int order;
+
+ /* Clear the bitmap */
+ memset((void *) map, 0, size);
+
+ spin_lock_irqsave(&page_alloc_lock, flags);
+
+ for (order = 0; order <= max_order; order++) {
+ struct free_area_struct *area = free_area + order;
+ struct page *next = memory_head(area), *node;
+ unsigned long index, map_nr;
+
+ while ((node = next->next) != memory_head(area)) {
+ next = node;
+ map_nr = node->map_nr;
+ index = map_nr + (1 << order) - 1;
+ while (index >= map_nr) {
+ set_bit(index, map);
+ index--;
+ }
+ }
+ }
+ spin_unlock_irqrestore(&page_alloc_lock, flags);
+}
+
+/*
+ * Look for candidate pages to anneal the specified order.
+ */
+static int find_candidate_pages(char *map, int size, int order, int*list)
+{
+ int nbits = (1 << order);
+ int num = 0, i, offset, index, map_nr;
+
+ for (i = 0; i < size; i++, map++) {
+ if (!*map)
+ continue;
+ for (offset = 0; offset < 8; offset += nbits) {
+ /*
+ * Look for page groups needing only one
+ * page to be coalesced.
+ */
+ map_nr = 0;
+ for (index = 0; index < nbits; index++) {
+ if (test_bit(offset + index, map))
+ continue;
+ /* More than one page missing? */
+ if (map_nr)
+ goto next;
+ /* Save the index of the missing page */
+ map_nr = (i << 3) + offset + index;
+#ifdef DEFRAG_DEBUG
+printk("find_candidate_pages: mask=%x, bit=%d, map_nr=%d\n",
+*map, offset + index, map_nr);
+#endif
+ }
+ /*
+ * Check whether we found a good candidate.
+ */
+ if (map_nr) {
+ struct page *page = mem_map + map_nr;
+ if (atomic_read(&page->count) > 1)
+ goto next;
+ /* This is all we handle so far */
+ if (page->buffers) {
+ if (page->buffers->b_count)
+ goto next;
+ } else if (!page->inode)
+ goto next;
+ list[num] = map_nr;
+ num++;
+ if (num >= DEFRAG_PAGES)
+ goto out;
+ }
+ next:
+ }
+
+ }
+out:
+#ifdef DEFRAG_DEBUG
+printk("find_candidate_pages: order %d, found %d\n", order, num);
+#endif
+ return num;
+}
+
+/*
+ * Attempt to free or replace the specified page.
+ */
+static int replace_page(char *bitmap, int map_nr)
+{
+ struct page *page = mem_map + map_nr;
+ int result = 0;
+
+ /*
+ * Check whether it's a page we can handle ...
+ */
+ if (PageLocked(page))
+ goto out;
+
+ switch (atomic_read(&page->count)) {
+ default:
+ /* may handle this case later */
+ goto out;
+ case 0:
+ /* shouldn't happen ... */
+ printk("replace_page: page not in use??\n");
+ goto out;
+ case 1:
+ break;
+ }
+
+ /* The easiest case of all ... */
+ if (page->inode) {
+#ifdef DEFRAG_DEBUG
+printk("replace_page: page %p in page cache\n", page);
+#endif
+ if (PageSwapCache(page))
+ delete_from_swap_cache(page);
+ else {
+ /* N.B. make a routine in filemap.c for this */
+ remove_page_from_hash_queue(page);
+ remove_page_from_inode_queue(page);
+ __free_page(page);
+ }
+ goto out_set;
+ }
+ /* Not too hard if it works ... */
+ if (page->buffers) {
+ struct buffer_head *bh = page->buffers;
+ if (try_to_free_buffer(bh, &bh, 6)) {
+#ifdef DEFRAG_DEBUG
+printk("replace_page: page %p in buffer cache\n", page);
+#endif
+ goto out_set;
+ }
+ goto out;
+ }
+ /* Possibly a mapped page? Not handled yet ... */
+ goto out;
+
+ /*
+ * We freed the page, so update our bitmap.
+ */
+out_set:
+ set_bit(page->map_nr, bitmap);
+ result = 1;
+out:
+ return result;
+}
+
+/*
+ * Top-level entry point for memory defragmentation.
+ * Attempts to repopulate the higher order memory lists
+ * by searching for page groups missing only a single
+ * page.
+ */
+void anneal_mem(void)
+{
+ int map_size = (num_physpages >> 3);
+ char * bitmap;
+ int order, i, num, progress;
+ int list[DEFRAG_PAGES], count[NR_MEM_LISTS];
+
+ /*
+ * Allocate memory for the page bitmap.
+ */
+ if (map_size <= PAGE_SIZE)
+ bitmap = (char *) __get_free_page(GFP_KERNEL);
+ else {
+ /* Use vmalloc() if memory > 128M? */
+ return;
+ }
+ if (!bitmap)
+ return;
+ /*
+ * Build the bitmap for the lists to be defragged.
+ * Only need to do this once ... map is updated as
+ * pages are freed.
+ */
+ build_free_map(bitmap, map_size, DEFRAG_MAX_ORDER);
+
+repeat:
+ if (!need_defrag(count)) {
+printk("anneal_mem: memory defragged\n");
+ goto out;
+ }
+ progress = 0;
+ /* Iterate over the orders to be defragged. */
+ for (order = 1; order <= DEFRAG_MAX_ORDER; order++) {
+ /*
+ * Select the pages to build page groups of this order.
+ */
+ num = find_candidate_pages(bitmap, map_size, order, list);
+ /*
+ * Try to free or replace the candidate pages.
+ */
+ for (i = 0; i < num; i++) {
+ if (replace_page(bitmap, list[i])) {
+ count[order]++;
+ if (count[order] >= free_mem_goal[order])
+ goto next;
+ progress = 1;
+ }
+ }
+ next:
+ }
+ /* Keep trying if we made progress. */
+ if (progress) {
+printk("anneal_mem: made progress, continuing\n");
+ goto repeat;
+ }
+
+out:
+ /* Free the bitmap */
+ if (map_size <= PAGE_SIZE)
+ free_page((unsigned long) bitmap);
+}
--- linux-2.1.108/mm/vmscan.c.old Fri Jul 3 10:32:34 1998
+++ linux-2.1.108/mm/vmscan.c Thu Jul 16 15:19:18 1998
@@ -576,19 +586,23 @@
tries = pager_daemon.tries_base >> free_memory_available(3);

while (tries--) {
- int gfp_mask;
-
if (++tried > pager_daemon.tries_min && free_memory_available(0))
break;
- gfp_mask = __GFP_IO;
- try_to_free_page(gfp_mask);
+ try_to_free_page(__GFP_IO);
/*
* Syncing large chunks is faster than swapping
* synchronously (less head movement). -- Rik.
*/
if (atomic_read(&nr_async_pages) >= pager_daemon.swap_cluster)
run_task_queue(&tq_disk);
+ }

+ /*
+ * Check whether we need to call the memory defragger.
+ */
+ if (need_defrag(NULL)) {
+printk("kswapd: %d free, attempting anneal\n", nr_free_pages);
+ anneal_mem();
}
}
/* As if we could ever get here - maybe we want to make this killable */

--------------9ECDEFDD16AE04DFB2AF5000--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html