Re: [PATCH] mm: fadvise: Drain all pagevecs if POSIX_FADV_DONTNEEDfails to discard all pages

From: Mel Gorman
Date: Tue Feb 19 2013 - 06:57:39 EST


On Thu, Feb 14, 2013 at 12:39:26PM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2013 12:03:49 +0000
> Mel Gorman <mgorman@xxxxxxx> wrote:
>
> > Rob van der Heij reported the following (paraphrased) on private mail.
> >
> > The scenario is that I want to avoid backups to fill up the page
> > cache and purge stuff that is more likely to be used again (this is
> > with s390x Linux on z/VM, so I don't give it as much memory that
> > we don't care anymore). So I have something with LD_PRELOAD that
> > intercepts the close() call (from tar, in this case) and issues
> > a posix_fadvise() just before closing the file.
> >
> > This mostly works, except for small files (less than 14 pages)
> > that remains in page cache after the face.
>
> Sigh. We've had the "my backups swamp pagecache" thing for 15 years
> and it's still happening.
>

Yes. There have been variations of it too such as applications being pushed
prematurely into swap. I'm not certain how well we currently handle that
because I haven't checked in a few months.

> It should be possible nowadays to toss your backup application into a
> container to constrain its pagecache usage. So we can type
>
> run-in-a-memcg -m 200MB /my/backup/program
>
> and voila. Does such a script exist and work?
>

Michal already gave an example. It might work slower if the backup
application has to stall in direct reclaim to keep the container within
limits though.

> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -17,6 +17,7 @@
> > #include <linux/fadvise.h>
> > #include <linux/writeback.h>
> > #include <linux/syscalls.h>
> > +#include <linux/swap.h>
> >
> > #include <asm/unistd.h>
> >
> > @@ -120,9 +121,22 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> > start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
> > end_index = (endbyte >> PAGE_CACHE_SHIFT);
> >
> > - if (end_index >= start_index)
> > - invalidate_mapping_pages(mapping, start_index,
> > + if (end_index >= start_index) {
> > + unsigned long count = invalidate_mapping_pages(mapping,
> > + start_index, end_index);
> > +
> > + /*
> > + * If fewer pages were invalidated than expected then
> > + * it is possible that some of the pages were on
> > + * a per-cpu pagevec for a remote CPU. Drain all
> > + * pagevecs and try again.
> > + */
> > + if (count < (end_index - start_index + 1)) {
> > + lru_add_drain_all();
> > + invalidate_mapping_pages(mapping, start_index,
> > end_index);
> > + }
> > + }
> > break;
> > default:
> > ret = -EINVAL;
>
> Those LRU pagevecs are a right pain. They provided useful gains way
> back when I first inflicted them upon Linux, but it would be nice to
> confirm whether they're still worthwhile and if so, whether the
> benefits can be replicated with some less intrusive scheme.
>

I know. Unfortunately I've had "Implement pagevec removal and test" on my
TODO list for the guts of a year now. It's long overdue to actually sit down
and just do it. It's a similar story for the per-cpu lists in front of the
page allocator which are overdue to see if they can be replaced. I actually
have a prototype replacement for that lying around but it performed slower
in tests and has bit-rotted since but it ran slower and has bit-rotted
since as it was based on kernel 3.4.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/