Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())

From: Benjamin LaHaise
Date: Mon May 20 2013 - 22:27:38 EST

On Tue, May 21, 2013 at 10:07:52AM +0800, Tang Chen wrote:
> I'm not saying using two callbacks before and after migration is better.
> I don't want to use address_space_operations is because there is no such
> member
> for anonymous pages.

That depends on the nature of the pinning. For the general case of
get_user_pages(), you're correct that it won't work for anonymous memory.

> In your idea, using a file mapping will create a
> address_space_operations. But
> I really don't think we can modify the way of memory allocation for all the
> subsystems who has this problem. Maybe not just aio and cma. That means if
> you want to pin pages in memory, you have to use a file mapping. This makes
> the memory allocation more complicated. And the idea should be known by all
> the subsystem developers. Is that going to happen ?

Different subsystems will need to use different approaches to fixing the
issue. I doubt any single approach will work for everything.

> I also thought about reuse one field of struct page. But as you said, there
> may not be many users of this functionality. Reusing a field of struct page
> will make things more complicated and lead to high coupling.

What happens when more than one subsystem tries to pin a particular page?
What if it's a shared page rather than an anonymous page?

> So, how about the other idea that Mel mentioned ?
> We create a 1-1 mapping of pinned page ranges and the pinner (subsystem
> callbacks and data), maybe a global list or a hash table. And then, we can
> find the callbacks.

Maybe that is the simplest approach, but it's going to make get_user_pages()
slower and more complicated (as if it wasn't already). Maybe with all the
bells and whistles of per-cpu data structures and such you can make it work,
but I'm pretty sure someone running the large unmentionable benchmark will
complain about the performance regressions you're going to introduce. At
least in the case of the AIO ring buffer, using the address_space approach
doesn't introduce any new performance issues. There's also the bigger
question of if you can or cannot exclude get_user_pages_fast() from this.
In short: you've got a lot more work on your hands to do.

> Thanks. :)


"Thought is the essence of where you are now."
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at