Re: userspace pagecache management tool

From: Pádraig Brady
Date: Tue Mar 06 2007 - 07:14:03 EST


Andrew Morton wrote:
> Yes. Let's flesh it out the backup program policy some more:
>
> - Unconditionally invalidate output files
>
> - on entry to read(), probe pagecache, record which pages in the range are present
>
> - on entry to next read(), shoot down those pages from the previous read
> which weren't in pagecache.
>
> - But we can do better! LRU the page's files up to a certain number of pages.
>
> - Once that point is exceeded, we need to reclaim some pages. Which
> ones? Well, we've been observing all reads, so we can record which pages
> were referenced once, and which ones were referenced multiple times so we
> can do arbitrarily complex page aging in there.
>
> - On close(), nuke all pages which weren't in core during open(), even if
> this app referenced them multiple times.
>
> - If the backup program decided to read its input files with mmap we're
> rather screwed. We can't intercept pagefaults so the best we can do is
> to restore the file's pagecache to its previous state on close().
>
> Or if it's really a problem, get control in there somehow and
> periodically poll the pagecache occupancy via mincore(), use madvise()
> then fadvise() to trim it back.
>
> That all sounds reasonably doable. It'd be pretty complex to do it
> in-kernel but we could do it there too. Problem is if course that the
> above strategy is explicitly optimised for the backup program and if it's
> in-kernel it becomes applicable to all other workloads.

I can see the above being possible, but I can't see the reason
for exposing that complexity to userspace. If I'm the target
audience for that API then it's broken as I'd mess it up,
or would take too long to get it right.

Can't we just fix the posix_fadvise() implementation to
only evict pages paged in by the current process.
Perhaps one could possibly just evict pages with _mapcount==0 ?

cheers,
Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/