Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

From: John Hubbard
Date: Thu Dec 06 2018 - 21:45:55 EST


On 12/4/18 5:57 PM, John Hubbard wrote:
> On 12/4/18 5:44 PM, Jerome Glisse wrote:
>> On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
>>> On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
>>>> On 12/4/18 3:03 PM, Dan Williams wrote:
>>>>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
>>>>> does this proposal interact with those?
>>>>
>>>> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there an entire
>>>> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? Said another
>>>> way: is it reasonable to disallow calling get_user_pages() on ZONE_DEVICE pages?
>>>>
>>>> If we have to support get_user_pages() on ZONE_DEVICE pages, then the whole
>>>> LRU field approach is unusable.
>>>
>>> We just need to rearrange ZONE_DEVICE pages. Please excuse the whitespace
>>> damage:
>>>
>>> +++ b/include/linux/mm_types.h
>>> @@ -151,10 +151,12 @@ struct page {
>>> #endif
>>> };
>>> struct { /* ZONE_DEVICE pages */
>>> + unsigned long _zd_pad_2; /* LRU */
>>> + unsigned long _zd_pad_3; /* LRU */
>>> + unsigned long _zd_pad_1; /* uses mapping */
>>> /** @pgmap: Points to the hosting device page map. */
>>> struct dev_pagemap *pgmap;
>>> unsigned long hmm_data;
>>> - unsigned long _zd_pad_1; /* uses mapping */
>>> };
>>>
>>> /** @rcu_head: You can use this to free a page by RCU. */
>>>
>>> You don't use page->private or page->index, do you Dan?
>>
>> page->private and page->index are use by HMM DEVICE page.
>>
>
> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining for
> dma-pinned information. Which might work. To recap, we need:
>
> -- 1 bit for PageDmaPinned
> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
> -- N bits for a reference count
>
> Those *could* be packed into a single 64-bit field, if really necessary.
>

...actually, this needs to work on 32-bit systems, as well. And HMM is using a lot.
However, it is still possible for this to work.

Matthew, can I have that bit now please? I'm about out of options, and now it will actually
solve the problem here.

Given:

1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on the LRU.
That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) is required,
for that case.

2) There is an independent bit available (according to Matthew).

3) HMM uses 4 of the 5 struct page fields, so only one field is available for a counter
in that case.

4) get_user_pages() must work on ZONE_DEVICE and HMM pages.

5) For a proper atomic counter for both 32- and 64-bit, we really do need a complete
unsigned long field.

So that leads to the following approach:

-- Use a single unsigned long field for an atomic reference count for the DMA pinned count.
For normal pages, this will be the *second* field of the LRU (in order to avoid PageTail bit).

For ZONE_DEVICE pages, we can also line up the fields so that the second LRU field is
available and reserved for this DMA pinned count. Basically _zd_pad_1 gets move up and
optionally renamed:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 017ab82e36ca..b5dcd9398cae 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -90,8 +90,8 @@ struct page {
* are in use.
*/
struct {
- unsigned long dma_pinned_flags;
- atomic_t dma_pinned_count;
+ unsigned long dma_pinned_flags; /* LRU.next */
+ atomic_t dma_pinned_count; /* LRU.prev */
};
};
/* See page-flags.h for PAGE_MAPPING_FLAGS */
@@ -161,9 +161,9 @@ struct page {
};
struct { /* ZONE_DEVICE pages */
/** @pgmap: Points to the hosting device page map. */
- struct dev_pagemap *pgmap;
- unsigned long hmm_data;
- unsigned long _zd_pad_1; /* uses mapping */
+ struct dev_pagemap *pgmap; /* LRU.next */
+ unsigned long _zd_pad_1; /* LRU.prev or dma_pinned_count */
+ unsigned long hmm_data; /* uses mapping */
};

/** @rcu_head: You can use this to free a page by RCU. */



-- Use an additional, fully independent page bit (from Matthew) for PageDmaPinned.


thanks,
--
John Hubbard
NVIDIA