Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps

From: Michal Hocko
Date: Tue Sep 25 2018 - 16:30:07 EST


On Tue 25-09-18 12:52:09, David Rientjes wrote:
> On Mon, 24 Sep 2018, Vlastimil Babka wrote:
>
> > On 9/24/18 10:02 PM, Michal Hocko wrote:
> > > On Mon 24-09-18 21:56:03, Michal Hocko wrote:
> > >> On Mon 24-09-18 12:30:07, David Rientjes wrote:
> > >>> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> > >>> introduced a regression in that userspace cannot always determine the set
> > >>> of vmas where thp is ineligible.
> > >>>
> > >>> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> > >>> to determine if a vma is eligible to be backed by hugepages.
> > >>
> > >> I was under impression that nh resp hg flags only tell about the madvise
> > >> status. How do you exactly use these flags in an application?
> > >>
>
> This is used to identify heap mappings that should be able to fault thp
> but do not, and they normally point to a low-on-memory or fragmentation
> issue. After commit 1860033237d4, our users of PR_SET_THP_DISABLE no
> longer show "nh" for their heap mappings so they get reported as having a
> low thp ratio when in reality it is disabled.

I am still not sure I understand the issue completely. How are PR_SET_THP_DISABLE
users any different from the global THP disabled case? Is this only
about the scope? E.g the one who checks for the state cannot check the
PR_SET_THP_DISABLE state? Besides that what are consequences of the
low ratio? Is this an example of somebody using the prctl and still
complaining or an external observer trying to do something useful which
ends up doing contrary?

> It is also used in
> automated testing to ensure that vmas get disabled for thp appropriately
> and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced
> this, and those tests now break.

This sounds like a bit of an abuse to me. It shows how an internal
implementation detail leaks out to the userspace which is something we
should try to avoid.

> > >> Your eligible rules as defined here:
> > >>
> > >>> + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
> > >>> + depending on system-wide settings and the mapping itself. See
> > >>> + Documentation/admin-guide/mm/transhuge.rst for default behavior. If a
> > >>> + mapping has a flag of "nh", it is not eligible to be backed by hugepages
> > >>> + in any condition, either because of prctl(PR_SET_THP_DISABLE) or
> > >>> + madvise(MADV_NOHUGEPAGE). PR_SET_THP_DISABLE takes precedence over any
> > >>> + MADV_HUGEPAGE.
> > >>
> > >> doesn't seem to match the reality. I do not see all the file backed
> > >> mappings to be nh marked. So is this really about eligibility rather
> > >> than the madvise status? Maybe it is just the above documentation that
> > >> needs to be updated.
> >
> > Yeah the change from madvise to eligibility in the doc seems to go too far.
> >
>
> I'll reword this to explicitly state that "hg" and "nh" mappings either
> allow or disallow thp backing.

How are you going to distinguish a regular THP-able mapping then? I am
still not sure how this is supposed to work. Could you be more specific.
Let's say I have a THP-able mapping (shmem resp. anon for the current
implementation). What is the the matrix for hg/nh wrt. madvice/nomadvise
PR_SET_THP_DISABLE and global THP enabled/disable.

--
Michal Hocko
SUSE Labs