Re: [patch] 4GB I/O, cut three

From: Mark Hemment (markhe@veritas.com)
Date: Wed May 30 2001 - 05:59:50 EST


On Wed, 30 May 2001, Jens Axboe wrote:
> On Wed, May 30 2001, Mark Hemment wrote:
> > Hi Jens,
> >
> > I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
> > modified qlogic fibre channel driver with 32disks hanging off it, without
> > any problems. The test used was SpecFS 2.0
>
> Cool, could you send me the qlogic diff? It's the one-liner can_dma32
> chance I'm interested in, I'm just not sure what driver you used :-)

  The qlogic driver is the one from;
        http://www.feral.com/isp.html
I find this much more stable than the one already in the kernel.
  It did just need the one-liner change, but as the driver isn't in the
kernel there isn't much point adding it change to your patch. :)

> > I did change the patch so that bounce-pages always come from the NORMAL
> > zone, hence the ZONE_DMA32 zone isn't needed. I avoided the new zone, as
> > I'm not 100% sure the VM is capable of keeping the zones it already has
> > balanced - and adding another one might break the camels back. But as the
> > test box has 4GB, it wasn't bouncing anyway.
>
> You are right, this is definitely something that needs checking. I
> really want this to work though. Rik, Andrea? Will the balancing handle
> the extra zone?

  In theory it should do - ie. there isn't anything to stop it.

  With NFS loads, over a ported VxFS filesystem, I do see some problems
between the NORMAL and HIGH zones. Thinking about it, ZONE_DMA32
shouldn't make this any worse.

  Rik, Andrea, quick description of a balancing problem;
        Consider a VM which is under load (but not stressed), such that
        all zone free-page pools are between their MIN and LOW marks, with
        pages in the inactive_clean lists.

        The NORMAL zone has non-zero page order allocations thrown at
        it. This causes __alloc_pages() to reap pages from the NORMAL
        inactive_clean list until the required buddy is built. The blind
        reaping causes the NORMAL zone to have a large number of free pages
        (greater than ->pages_low).

        Now, when HIGHMEM allocations come in (for page cache pages), they
        skip the HIGH zone and use the NORMAL zone (as it now has plenty
        of free pages) - the code at the top of __alloc_pages(), which
        checks against ->pages_low.

        But the NORMAL zone is usually under more pressure than the HIGH
        zone - as many more allocations needed ready-mapped memory. This
        causes the page-cache pages from the NORMAL zone to come under
        more pressure, and are "re-cycled" quicker than page-cache pages
        in the HIGHMEM zone.

  OK, we shouldn't be throwing too many non-zero page allocations at
__alloc_pages(), but it does happen.
  Also, the problem isn't as bad as it first looks - HIGHMEM page-cache
pages do get "recycled" (reclaimed), but there is a slight imbalance.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu May 31 2001 - 21:00:45 EST