Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

From: Dan Williams
Date: Tue Jun 04 2019 - 16:15:49 EST


On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>
> On 6/4/19 9:48 AM, ira.weiny@xxxxxxxxx wrote:
> > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > release_pages() is an optimized version of a loop around put_page().
> > Unfortunately for devmap pages the logic is not entirely correct in
> > release_pages(). This is because device pages can be more than type
> > MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS
> > DAX, and PCI P2PDMA. Some of these have specific needs to "put" the
> > page while others do not.
> >
> > This logic to handle any special needs is contained in
> > put_devmap_managed_page(). Therefore all devmap pages should be
> > processed by this function where we can contain the correct logic for a
> > page put.
> >
> > Handle all device type pages within release_pages() by calling
> > put_devmap_managed_page() on all devmap pages. If
> > put_devmap_managed_page() returns true the page has been put and we
> > continue with the next page. A false return of
> > put_devmap_managed_page() means the page did not require special
> > processing and should fall to "normal" processing.
> >
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.[1]
> >
> > [1] https://lore.kernel.org/lkml/20190523172852.GA27175@xxxxxxxxxxxxxxxxxxxxxxxxx/
> >
> > Cc: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
> > Cc: Michal Hocko <mhocko@xxxxxxxx>
> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx>
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > ---
> > Changes from V2:
> > Update changelog for more clarity as requested by Michal
> > Update comment WRT "failing" of put_devmap_managed_page()
> >
> > Changes from V1:
> > Add comment clarifying that put_devmap_managed_page() can still
> > fail.
> > Add Reviewed-by tags.
> >
> > mm/swap.c | 13 +++++++++----
> > 1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 7ede3eddc12a..6d153ce4cb8c 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> > if (is_huge_zero_page(page))
> > continue;
> >
> > - /* Device public page can not be huge page */
> > - if (is_device_public_page(page)) {
> > + if (is_zone_device_page(page)) {
> > if (locked_pgdat) {
> > spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> > flags);
> > locked_pgdat = NULL;
> > }
> > - put_devmap_managed_page(page);
> > - continue;
> > + /*
> > + * Not all zone-device-pages require special
> > + * processing. Those pages return 'false' from
> > + * put_devmap_managed_page() expecting a call to
> > + * put_page_testzero()
> > + */
>
> Just a documentation tweak: how about:
>
> /*
> * ZONE_DEVICE pages that return 'false' from
> * put_devmap_managed_page() do not require special
> * processing, and instead, expect a call to
> * put_page_testzero().
> */

Looks better to me, but maybe just go ahead and list those
expectations explicitly. Something like:

/*
* put_devmap_managed_page() only handles
* ZONE_DEVICE (struct dev_pagemap managed)
* pages when the hosting dev_pagemap has the
* ->free() or ->fault() callback handlers
* implemented as indicated by
* dev_pagemap.type. Otherwise the expectation
* is to fall back to a plain decrement /
* put_page_testzero().
*/