Re: Fwd: Control page reclaim granularity

From: Konstantin Khlebnikov
Date: Tue Mar 13 2012 - 01:00:15 EST


Konstantin Khlebnikov wrote:
Minchan Kim wrote:
On Mon, Mar 12, 2012 at 06:18:21PM +0400, Konstantin Khlebnikov wrote:
Minchan Kim wrote:
On Mon, Mar 12, 2012 at 04:14:14PM +0800, Zheng Liu wrote:
On 03/12/2012 02:20 PM, Konstantin Khlebnikov wrote:
Minchan Kim wrote:
On Mon, Mar 12, 2012 at 10:06:09AM +0800, Zheng Liu wrote:
<CUT>

Now problem is that

1. User want to keep pages which are used once in a while in memory.
2. Kernel want to reclaim them because they are surely reclaim target
pages in point of view by LRU.

The most desriable approach is that user should use mlock to guarantee
them in memory. But mlock is too big overhead and user doesn't want to
keep
memory all pages all at once.(Ie, he want demand paging when he need
the page)
Right?

madvise, it's a just hint for kernel and kernel doesn't need to make
sure madvise's behavior.
In point of view, such inconsistency might not be a big problem.

Big problem I think now is that user should use madvise(WILLNEED)
periodically because such
activation happens once when user calls madvise. If user doesn't use
page frequently after
user calls it, it ends up moving into inactive list and even could be
reclaimed.
It's not good. :-(

Okay. How about adding new VM_WORKINGSET?
And reclaimer would give one more round trip in active/inactive list
erwhen reclaim happens
if the page is referenced.

Sigh. We have no room for new VM_FLAG in 32 bit.
p
It would be nice to mark struct address_space with this flag and export
AS_UNEVICTABLE somehow.
Maybe we can reuse file-locking engine for managing these bits =)

Make sense to me. We can mark this flag in struct address_space and check
it in page_refereneced_file(). If this flag is set, it will be cleard and

Disadvantage is that we could set reclaim granularity as per-inode.
I want to set it as per-vma, not per-inode.

But with per-inode flag we can tune all files, not only memory-mapped.

I don't oppose per-inode setting but I believe we need file range or mmapped vma,
still. One file may have different characteristic part, something is working set
something is streaming part.

See, attached patch. Currently I thinking about managing code,
file-locking engine really fits perfectly =)

file-locking engine?
You consider fcntl as interface for it?
What do you mean?


If we set bits on inode we somehow account its users and clear AS_WORKINGSET and AS_UNEVICTABLE
at last file close. We can use file-locking engine for locking inodes in memory -- file lock automatically
release inode at last fput(). Maybe it's too tricky and we should add couple simple atomic counters to
generic strict inode (like i_writecount/i_readcount) but in this case we will add new code on fast-path.
So, looks like invention new kind of struct file_lock is best approach.
I don't want implement range-locking for now, but I can do it if somebody really wants this.

Yes, we can use fcntl(), but fadvise() is much better.

Another mad idea: if we mark vma, then we can add fake vma (belong init_mm for example) to
inode rmap to lock inode's pages range in memory without actually mapping file.
In page_referenced_one() we should handle this fake vma differently,
because page_check_address() will always fail for it.
Thus we can effectively implement AS_WORKINGSET and AS_UNEVICTABLE for arbitrary page ranges.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/