Re: A couple of 2.4.23-pre4 VM nits

From: Shantanu Goel
Date: Sun Sep 21 2003 - 00:33:18 EST


Agreed on all counts. I just blindly copied the
max_scan decrement from 2.4.22. Even there your
suggestion would make sense. Attached is a new patch
which adds support for vm_vfs_scan_interval sysctl and
also fixes the location of max_scan decrement.

Thanks,
Shantanu

--- Andrea Arcangeli <andrea@xxxxxxx> wrote:
> Hi Shantanu,
>
> On Sat, Sep 20, 2003 at 06:39:30AM -0700, Shantanu
> Goel wrote:
> > Hi Andrea,
> >
> > The VM fixes perform rather well in my testing
> > (thanks!), but I noticed a couple of glitches that
> the
> > attached patch addresses.
> >
> > 1. max_scan is never decremented in
> shrink_cache(). I
> > am assuming this is a typo.
>
> this is an huge half-merging error, no surprise Andi
> run into vm
> troubles with pre4. My tree looks like this for
> years:
>
> if (unlikely(!page_count(page)))
> continue;
>
> only_metadata = 0;
> if (!memclass(page_zone(page), classzone)) {
> /*
> * Hack to address an issue found by Rik. The
> * problem is that
> * highmem pages can hold buffer headers
> * allocated
> * from the slab on lowmem, and so if we are
> * working
> * on the NORMAL classzone here, it is correct
> * not to
> * try to free the highmem pages themself (that
> * would be useless)
> * but we must make sure to drop any lowmem
> * metadata related to those
> * highmem pages.
> */
> if (page->buffers && page->mapping) { /* fast
> path racy check */
> if (unlikely(TryLockPage(page)))
> continue;
> if (page->buffers && page->mapping &&
> memclass_related_bhs(page, classzone)) { /* non racy
> check */
> only_metadata = 1;
> goto free_bhs;
> }
> UnlockPage(page);
> }
> continue;
> }
>
> max_scan--;
>
> max_scan-- should happen only after the memclass
> check to ensure not
> failing too early on GFP_KERNEL/DMA allocations
> (i.e. no highmem)
>
> This is the right fix:
>
> --- 2.4.23pre4/mm/vmscan.c.~1~ 2003-09-13
> 00:08:04.000000000 +0200
> +++ 2.4.23pre4/mm/vmscan.c 2003-09-21
> 04:40:12.000000000 +0200
> @@ -401,6 +401,8 @@ static int shrink_cache(int
> nr_pages, zo
> if (!memclass(page_zone(page), classzone))
> continue;
>
> + max_scan--;
> +
> /* Racy check to avoid trylocking when not
> worthwhile */
> if (!page->buffers && (page_count(page) != 1 ||
> !page->mapping))
> goto page_mapped;
>
> so your fix is slightly wrong in non-highmem terms
> (also for scheduling
> terms, you would decrease it even when there's a
> schedule). We need to
> finish the merge ASAP with Marcelo. I didn't send
> specific patches to
> Marcelo myself yet, I only pointed out the url and
> list of them plus
> I described the details he wanted to know. I
> understand he merged what
> he found most interesting so I didn't notice this
> half merge problem yet
> but I will start looking into this with the highest
> prio on Monday (I
> was going to look into pre4 very soon for -aa that
> now will reject
> bigtime, and the watermarks too but I had some
> trouble with the ram and
> the scheduler in my fast amd64 this week that kept
> me busy, the
> scheduler fix will be in next -aa and improves ppc
> as well 11%, on some
> of my workloads it's a 99% improvement [this isn't
> relevant for mainline
> though and apparently it was already fixed in 2.6]
> ;).
>
> > 2. The second part of the patch makes sure that
> > inode/dentry caches are shrunk at least once every
> 5
> > secs. On a machine with a heavy inode
> stat/directory
> > lookup load (e.g. NFS server), most of the memory
> > winds up sitting idle in unused inodes/dentry.
> The
> > present code only reclaims these when a swap_out()
> > happens or shrink_caches() fails. This can take a
> > while on a machine will very few mapped pages such
> as
> > an NFS server.
>
> For lots of workloads this will nuke dcache way too
> fast and rebuilding
> it with the fs and buffercache is costly. However
> if you make the delay
> a sysctl set to MAX_SCHEDULE_TIMEOUT, I would be
> definitely fine in
> merging it. That way it won't make any difference by
> default and it can
> help in the corner cases where hacks like this may
> provide benefits as
> you noted.
>
> thanks!
>
> Andrea - If you prefer relying on open source
> software, check these links:
>
>
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> http://www.cobite.com/cvsps/
> svn://svn.kernel.org/linux-2.[46]/trunk

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Attachment: vfs-interval.patch
Description: vfs-interval.patch