Re: userspace pagecache management tool

From: Andrew Morton
Date: Sat Mar 03 2007 - 20:03:02 EST


On Sat, 03 Mar 2007 19:01:01 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:

> Andrew Morton wrote:
> > On Sat, 03 Mar 2007 17:25:30 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:
> >
> >> backup program
> >
> > A suitable policy for a backup program would probably be to invalidate any
> > output file(s) and to invalidate those pages of the input files which were
> > not in cache when the backup program first opened those files. That way
> > the backup program will have no effect on the cache state, except for the
> > race situation where someone read an uncached file while the backup program
> > was reading from it too.
>
> The use-once policy we have in the kernel should work
> perfectly fine for backups. All we need to do is
> actually honor the accessed bit on active page cache
> pages, instead of flushing them onto the inactive
> list.
>
> What am I overlooking?

That'll improve backups but will break other things.

To do this effectively we'd need to change the policy so that new pagecache
allocations cause no scanning of used-twice pages at all. So that even
after many gigs of backing up, the working set is still there.

Problem is, (for example) what about the person who has 80% of memory in
used-twice state and who then reads a file or files which are 20% or more of
the size of memory, two or more times. It'll be 100% cache misses, every time.
This will happen quite a lot. IOW, once those pages are in used-twice state,
how does further pagecache activity ever get them _out_ of that state? Only
by joining the used-twice page set, and that can't happen if the used-once-so-far
pages got reclaimed.

Doing a refault thing would help a bit, but stops working at a certain point.


> > This can be added in an hour or two with no kernel changes (use mincore).
>
> mincore only works for mmaped areas, we'd need an fincore
> to work with file handles.

The LD_PRELOAD code has the fd and can mmap it to perform the pagecache
probe.

fincore() would be a bit neater, but given the rarity with which mincore()
is used it's perhaps hard to justify adding a slightly more efficient and
slightly more convenient subset of mincore().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/