On Tue, Sep 08, 2020 at 11:09:25AM -0400, Zi Yan wrote:
On 7 Sep 2020, at 3:20, Michal Hocko wrote:
On Fri 04-09-20 14:10:45, Roman Gushchin wrote:Something like MADV_HUGEPAGE_SYNC? It would be useful, since users have
On Fri, Sep 04, 2020 at 09:42:07AM +0200, Michal Hocko wrote:[...]
better and clearer control of getting huge pages from the kernel and
know when they will pay the cost of getting the huge pages.
I would think the suggestion is more about the huge page control options
currently provided by the kernel do not have predictable performance
outcome, since MADV_HUGEPAGE is a best-effort option and does not tell
users whether the marked virtual address range is backed by huge pages
or not when the madvise returns. MADV_HUGEPAGE_SYNC would provide a
deterministic result to users on whether the huge page(s) are formed
or not.
Yeah, I agree with Michal here, we need a more straightforward interface.
The hard question here is how hard the kernel should try to allocate
a gigantic page and how fast it should give up and return an error?
I'd say to try really hard if there are some chances to succeed,
so that if an error is returned, there are no more reasons to retry.
Any objections/better ideas here?
Given that we need to pass a page size, we probably need either to introduce
a new syscall (madvise2?) with an additional argument, or add a bunch
of new madvise flags, like MADV_HUGEPAGE_SYNC + encoded 2MB, 1GB etc.
Idk what is better long-term, but new madvise flags are probably slightly
easier to deal with in the development process.