Re: [patch 1/2] mm: fincore()

From: Andrew Morton
Date: Fri Feb 15 2013 - 18:42:41 EST

On Fri, 15 Feb 2013 18:13:04 -0500
Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> On Fri, Feb 15, 2013 at 01:27:38PM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2013 01:34:50 -0500
> > Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >
> > > + * The status is returned in a vector of bytes. The least significant
> > > + * bit of each byte is 1 if the referenced page is in memory, otherwise
> > > + * it is zero.
> >
> > Also, this is going to be dreadfully inefficient for some obvious cases.
> >
> > We could address that by returning the info in some more efficient
> > representation. That will be run-length encoded in some fashion.
> >
> > The obvious way would be to populate an array of
> >
> > struct page_status {
> > u32 present:1;
> > u32 count:31;
> > };
> >
> > or whatever.
> I'm having a hard time seeing how this could be extended to more
> status bits without stifling the optimization too much.

See other email: add a syscall arg which specifies the boolean status
which we're searching for.

> If we just
> add more status bits to one page_status, the likelihood of long runs
> where all bits are in agreement decreases. But as the optimization
> becomes less and less effective, we are stuck with an interface that
> is more PITA than just using mmap and mincore again.
> The user has to supply a worst-case-sized vector with one struct
> page_status per page in the range, but the per-page item will be
> bigger than with the byte vector because of the additional run length
> variable.

Yes, we'd need to tell the kernel how much storage is available for the

> However, one struct page_status per run leaves you with a worst case
> of one syscall per page in the range.


> I dunno. The byte vector might not be optimal but its worst cases
> seem more attractive, is just as extensible, and dead simple to use.

But I think "which pages from this 4TB file are in core" will not be an
uncommon usage, and writing a gig of memory to find three pages is just

I wonder what the most common usage would be (one should know this
before merging the syscall :)). I guess "is this relatively-small
range of the file in core" and/or "which pages from this
relatively-small range of the file will I need to read", etc.

The syscall should handle the common usages very well. But it
shouldn't handle uncommon usages very badly!
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at