Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

From: anthony . yznaga
Date: Fri Nov 09 2018 - 19:05:39 EST




On 11/09/2018 04:13 AM, Kirill A. Shutemov wrote:
> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
>> The basic idea as outlined by Mel Gorman in [2] is:
>>
>> 1) On first fault in a sufficiently sized range, allocate a huge page
>> sized and aligned block of base pages. Map the base page
>> corresponding to the fault address and hold the rest of the pages in
>> reserve.
>> 2) On subsequent faults in the range, map the pages from the reservation.
>> 3) When enough pages have been mapped, promote the mapped pages and
>> remaining pages in the reservation to a huge page.
>> 4) When there is memory pressure, release the unused pages from their
>> reservations.
> I haven't yet read the patch in details, but I'm skeptical about the
> approach in general for few reasons:
>
> - PTE page table retracting to replace it with huge PMD entry requires
> down_write(mmap_sem). It makes the approach not practical for many
> multi-threaded workloads.
>
> I don't see a way to avoid exclusive lock here. I will be glad to
> be proved otherwise.
>
> - The promotion will also require TLB flush which might be prohibitively
> slow on big machines.
>
> - Short living processes will fail to benefit from THP with the policy,
> even with plenty of free memory in the system: no time to promote to THP
> or, with synchronous promotion, cost will overweight the benefit.
>
> The goal to reduce memory overhead of THP is admirable, but we need to be
> careful not to kill THP benefit itself. The approach will reduce number of
> THP mapped in the system and/or shift their allocation to later stage of
> process lifetime.
>
> The only way I see it can be useful is if it will be possible to apply the
> policy on per-VMA basis. It will be very useful for malloc()
> implementations, for instance. But as a global policy it's no-go to me.
I agree that this should not be a global policy. For example, it seems to me
that a VMA where MADV_HUGEPAGE has been applied should get huge
pages on first faults (I need to fix that in my implementation).
>
> Prove me wrong with performance data. :)
I'll try. :-)

Thanks for the comments!

Anthony