Re: [PATCH v4 0/2] fadvise: move active pages to inactive list withPOSIX_FADV_DONTNEED

From: Pádraig Brady
Date: Wed Jun 29 2011 - 07:21:09 EST


On 29/06/11 00:03, Andrew Morton wrote:
> On Wed, 29 Jun 2011 00:56:45 +0200
> Andrea Righi <andrea@xxxxxxxxxxxxxxx> wrote:
>
>>>>
>>>> In this way if the backup was the only user of a page, that page will be
>>>> immediately removed from the page cache by calling POSIX_FADV_DONTNEED. If the
>>>> page was also touched by other processes it'll be moved to the inactive list,
>>>> having another chance of being re-added to the working set, or simply reclaimed
>>>> when memory is needed.
>>>
>>> So if an application touches a page twice and then runs
>>> POSIX_FADV_DONTNEED, that page will now not be freed.
>>>
>>> That's a big behaviour change. For many existing users
>>> POSIX_FADV_DONTNEED simply doesn't work any more!
>>
>> Yes. This is the main concern that was raised by P__draig.
>>
>>>
>>> I'd have thought that adding a new POSIX_FADV_ANDREA would be safer
>>> than this.
>>
>> Actually Jerry (in cc) proposed
>> POSIX_FADV_IDONTNEEDTHISBUTIFSOMEBODYELSEDOESTHENDONTTOUCHIT in a
>> private email. :)
>
> Sounds good. Needs more underscores though.
>
>>>
>>>
>>> The various POSIX_FADV_foo's are so ill-defined that it was a mistake
>>> to ever use them. We should have done something overtly linux-specific
>>> and given userspace more explicit and direct pagecache control.
>>
>> That would give us the possibility to implement a wide range of
>> different operations (drop, drop if used once, add to the active list,
>> add to the inactive list, etc..). Some users always complain that they
>> would like to have a better control over the page cache from userspace.
>
> Well, I'd listen to proposals ;)
>
> One thing we must be careful about is to not expose things like "active
> list" to userspace. linux-4.5 may not _have_ an active list, and its
> implementors would hate us and would have to jump through hoops to
> implement vaguely compatible behaviour in the new scheme.
>
> So any primitives which are exposed should be easily implementable and
> should *make sense* within any future scheme...

Agreed.

In fairness to posix_fadvise(), I think it's designed to
specify hints for the current process' use of data
so that it can get at it more efficiently and also be
allow the system to manipulate cache more efficiently.
I.E. it's not meant for direct control of the cache.

That being said, existing use has allowed this,
and it would be nice not to change without consideration.

I've mentioned how high level cache control functions
might map to the existing FADV knobs here:

http://marc.info/?l=linux-kernel&m=130917619416123&w=2

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/