Re: [PATCH] mm, zone_device: replace {get, put}_zone_device_page() with a single reference

From: Logan Gunthorpe
Date: Thu Apr 27 2017 - 12:12:05 EST




On 26/04/17 06:55 PM, Dan Williams wrote:
> @@ -277,7 +269,10 @@ struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
> *
> * Notes:
> * 1/ @ref must be 'live' on entry and 'dead' before devm_memunmap_pages() time
> - * (or devm release event).
> + * (or devm release event). The expected order of events is that @ref has
> + * been through percpu_ref_kill() before devm_memremap_pages_release(). The
> + * wait for the completion of kill and percpu_ref_exit() must occur after
> + * devm_memremap_pages_release().
> *
> * 2/ @res is expected to be a host memory range that could feasibly be
> * treated as a "System RAM" range, i.e. not a device mmio range, but
> @@ -379,6 +374,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
> */
> list_del(&page->lru);
> page->pgmap = pgmap;
> + percpu_ref_get(ref);
> }
> devres_add(dev, page_map);
> return __va(res->start);
> diff --git a/mm/swap.c b/mm/swap.c
> index 5dabf444d724..01267dda6668 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -97,6 +97,16 @@ static void __put_compound_page(struct page *page)
>
> void __put_page(struct page *page)
> {
> + if (is_zone_device_page(page)) {
> + put_dev_pagemap(page->pgmap);
> +
> + /*
> + * The page belong to device, do not return it to
> + * page allocator.
> + */
> + return;
> + }
> +
> if (unlikely(PageCompound(page)))
> __put_compound_page(page);
> else
>

Forgive me if I'm missing something but this doesn't make sense to me.
We are taking a reference once when the region is initialized and
releasing it every time a page within the region's reference count drops
to zero. That does not seem to be symmetric and I don't see how it
tracks that pages are in use. Shouldn't get_dev_pagemap be called when
any page is allocated or something like that (ie. the inverse of
__put_page)?

Thanks,

Logan