PATCH: Balancing patch against pre9-3

From: Quintela Carreira Juan J. (quintela@fi.udc.es)
Date: Mon May 22 2000 - 06:27:54 EST

Next message: Alan Cox: "Re: ide.2.2.15.20000509 patch breaks WDC31200F w/ VIA 82C586B"
Previous message: Jerry Lundström: "Problem with the ide cdrom drivers in 2.2.15 ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

I have finished the port of the patch to pre9-3. My stability
problems yesterday was related to a bad patch applied by hand :(
(Yes I am porting a brown paper bag in my head).

Linus, you had applied almost all the patch, the remaining parts are:

- the use of the nr_dirty counter. It counts the number of dirty
  pages that we are allowed to write asynchronously before we wait for
  completion. Notice also that we are waiting only if __GFP_IO is
  set. We initialize that variable to the priority. This means that
  with high priorities, we will almost never wait, but with lower
  priorities (high VM pressure) we will wait almost for all pages.
  This looks to me as the correct behavior.

- We start do_try_to_free_pages with a *high* priority. In my tests
  here (I have posted the numbers in previous posts), shown that with
  a priority of 64, we allocate almost always at that priority, then
  we need to wait for less pages, which means lower latency. The other
  advantage of having a high priority is that we will push less hard
  shrink_mmap, and we will call swap_out sooner. The last thing is no
  bad as we call swap_out sooner, but with a high_priority, which means
  that it will try to free few pages.

- Rik suggested (and I agree) that in swap_out we must *try* harder to
swap_out pages if we are in low priorities. That is the effect of
the change in the counter calculation in swap_out.

- In do_try_to_free_pages, we return success if we have freed some
  page, not only if we have freed FREE_COUNT pages. That solved the
  last problems with mmap002 been killed in pre9-2. I have changed
  SWAP_COUNT to 16. In some of my experiments showed that only
  increasing the value of SWAP_COUNT improved the behavior of the
  system. I think that the problem here is that under high VM
  pressure, we try to swap the same number of pages that we will try
  to free, and people reference that pages while we are waiting, or
  other processes steal our pages. Other thougths on that?

- I have been studying which processes are swapped, and with this
  patch all the processes except the used ones are swapped. Here xfs
  (font manager) is swapped out (it gets only one page 4k in memory).
  Ben, could you test that it also swaps xfs-tt pages?

- The problem of the stalls continue, we have stalls from time to
  time, basically when we have all the memory used by the page
  cache with dirty pages (i.e. mmap002 running alone in the machine).
  I know that this is a pathological case, but the stalls are
  sometimes as big as 5 seconds. This case appears not to be as
  pathological as thought, people from multimedia are reporting
  similar problems, and people with big writes (dd a cdrom and
  similar are noting similar problems). I am studying that problem.

Comments?

Later, Juan.

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-3/mm/filemap.c testing/mm/filemap.c
--- pre9-3/mm/filemap.c Sun May 21 17:38:02 2000
+++ testing/mm/filemap.c Mon May 22 13:05:36 2000
@@ -244,13 +244,19 @@
         spin_unlock(&pagecache_lock);
}

+/*
+ * nr_dirty represents the number of dirty pages that we will write async
+ * before doing sync writes. We can only do sync writes if we can
+ * wait for IO (__GFP_IO set).
+ */
int shrink_mmap(int priority, int gfp_mask)
{
- int ret = 0, count;
+ int ret = 0, count, nr_dirty;
         struct list_head * page_lru;
         struct page * page = NULL;

         count = nr_lru_pages / (priority + 1);
+ nr_dirty = priority;

         /* we need pagemap_lru_lock for list_del() ... subtle code below */
         spin_lock(&pagemap_lru_lock);
@@ -287,7 +293,8 @@
                  * of zone - it's old.
                  */
                 if (page->buffers) {
- if (!try_to_free_buffers(page, 1))
+ int wait = ((gfp_mask & __GFP_IO) && (nr_dirty-- < 0));
+ if (!try_to_free_buffers(page, wait))
                                 goto unlock_continue;
                         /* page was locked, inode can't go away under us */
                         if (!page->mapping) {
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-3/mm/vmscan.c testing/mm/vmscan.c
--- pre9-3/mm/vmscan.c Sun May 21 17:38:03 2000
+++ testing/mm/vmscan.c Mon May 22 13:10:38 2000
@@ -363,7 +363,7 @@
          * Think of swap_cnt as a "shadow rss" - it tells us which process
          * we want to page out (always try largest first).
          */
- counter = (nr_threads << 1) >> (priority >> 1);
+ counter = (nr_threads << 2) >> (priority >> 2);
         if (counter < 1)
                 counter = 1;

@@ -430,16 +430,17 @@
  * latency.
  */
#define FREE_COUNT 8
-#define SWAP_COUNT 8
+#define SWAP_COUNT 16
static int do_try_to_free_pages(unsigned int gfp_mask)
{
         int priority;
         int count = FREE_COUNT;
+ int swap_count;

         /* Always trim SLAB caches when memory gets low. */
         kmem_cache_reap(gfp_mask);

- priority = 32;
+ priority = 64;
         do {
                 while (shrink_mmap(priority, gfp_mask)) {
                         if (!--count)
@@ -471,12 +472,11 @@
                  * put in the swap cache), so we must not count this
                  * as a "count" success.
                  */
- {
- int swap_count = SWAP_COUNT;
- while (swap_out(priority, gfp_mask))
- if (--swap_count < 0)
- break;
- }
+ swap_count = SWAP_COUNT;
+ while (swap_out(priority, gfp_mask))
+ if (--swap_count < 0)
+ break;
+
         } while (--priority >= 0);

         /* Always end on a shrink_mmap.. */
@@ -484,8 +484,8 @@
                 if (!--count)
                         goto done;
         }
-
- return 0;
+ /* We return 1 if we are freed some page */
+ return (count != FREE_COUNT);

done:
         return 1;

-- In theory, practice and theory are the same, but in practice they are different -- Larry McVoy

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Cox: "Re: ide.2.2.15.20000509 patch breaks WDC31200F w/ VIA 82C586B"
Previous message: Jerry Lundström: "Problem with the ide cdrom drivers in 2.2.15 ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:21 EST