Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

From: Kirill A. Shutemov
Date: Fri Nov 09 2018 - 07:13:26 EST


On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
> The basic idea as outlined by Mel Gorman in [2] is:
>
> 1) On first fault in a sufficiently sized range, allocate a huge page
> sized and aligned block of base pages. Map the base page
> corresponding to the fault address and hold the rest of the pages in
> reserve.
> 2) On subsequent faults in the range, map the pages from the reservation.
> 3) When enough pages have been mapped, promote the mapped pages and
> remaining pages in the reservation to a huge page.
> 4) When there is memory pressure, release the unused pages from their
> reservations.

I haven't yet read the patch in details, but I'm skeptical about the
approach in general for few reasons:

- PTE page table retracting to replace it with huge PMD entry requires
down_write(mmap_sem). It makes the approach not practical for many
multi-threaded workloads.

I don't see a way to avoid exclusive lock here. I will be glad to
be proved otherwise.

- The promotion will also require TLB flush which might be prohibitively
slow on big machines.

- Short living processes will fail to benefit from THP with the policy,
even with plenty of free memory in the system: no time to promote to THP
or, with synchronous promotion, cost will overweight the benefit.

The goal to reduce memory overhead of THP is admirable, but we need to be
careful not to kill THP benefit itself. The approach will reduce number of
THP mapped in the system and/or shift their allocation to later stage of
process lifetime.

The only way I see it can be useful is if it will be possible to apply the
policy on per-VMA basis. It will be very useful for malloc()
implementations, for instance. But as a global policy it's no-go to me.

Prove me wrong with performance data. :)

--
Kirill A. Shutemov