On Fri, 27 Feb 2015, Rik van Riel wrote:
I think we do need to change the default.
Why? See this bug:
The problem was reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=93111
Now, there may be a better value than HPAGE_PMD_NR/8, but
I am not sure what it would be, or why.
I do know that HPAGE_PMD_NR-1 results in undesired behaviour,
as seen in the bug above...
I know that the value of 64 would also be undesirable for Google since we
tightly constrain memory usage, we have used max_ptes_none == 0 since it
was introduced. We can get away with that because our malloc() is
modified to try to give back large contiguous ranges of memory
periodically back to the system, also using madvise(MADV_DONTNEED), and
tries to avoid splitting thp memory.
The value is determined by how the system will be used: do you tightly
constrain memory usage and not allow any unmapped memory be collapsed into
a hugepage, or do you have an abundance of memory and really want an
aggressive value like HPAGE_PMD_NR-1. Depending on the properties of the
system, you can tune this to anything you want just like we do in
initscripts.
I'm only concerned here about changing a default that has been around for
four years and the possibly negative implications that will have on users
who never touch this value. They undoubtedly get less memory backed by
thp, and that can lead to a performance regression. So if this patch is
merged and we get a bug report for the 4.1 kernel, do we tell that user
that we changed behavior out from under them and to adjust the tunable
back to HPAGE_PMD_NR-1?
Meanwhile, the bug report you cite has a workaround that has always been
available for thp kernels:
# echo 64 > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none