Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

From: Michal Hocko
Date: Tue Nov 27 2018 - 13:17:34 EST


On Tue 27-11-18 09:08:50, Linus Torvalds wrote:
> On Mon, Nov 26, 2018 at 10:24 PM kernel test robot
> <rong.a.chen@xxxxxxxxx> wrote:
> >
> > FYI, we noticed a -61.3% regression of vm-scalability.throughput due
> > to commit ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> > MADV_HUGEPAGE mappings")
>
> Well, that's certainly noticeable and not good.
>
> Andrea, I suspect it might be causing fights with auto numa migration..
>
> Lots more system time, but also look at this:
>
> > 1122389 ± 9% +17.2% 1315380 ± 4% proc-vmstat.numa_hit
> > 214722 ± 5% +21.6% 261076 ± 3% proc-vmstat.numa_huge_pte_updates
> > 1108142 ± 9% +17.4% 1300857 ± 4% proc-vmstat.numa_local
> > 145368 ± 48% +63.1% 237050 ± 17% proc-vmstat.numa_miss
> > 159615 ± 44% +57.6% 251573 ± 16% proc-vmstat.numa_other
> > 185.50 ± 81% +8278.6% 15542 ± 40% proc-vmstat.numa_pages_migrated
>
> Should the commit be reverted? Or perhaps at least modified?

Well, the commit is trying to revert to the behavior before
5265047ac301 because there are real usecases that suffered from that
change and bug reports as a result of that.

will-it-scale is certainly worth considering but it is an artificial
testcase. A higher NUMA miss rate is an expected side effect of the
patch because the fallback to a different NUMA node is more likely. The
__GFP_THISNODE side effect is basically introducing node-reclaim
behavior for THPages. Another thing is that there is no good behavior
for everybody. Reclaim locally vs. THP on a remote node is hard to
tell by default. We have discussed that at length and there were some
conclusions. One of them is that we need a numa policy to tell whether
a expensive localility is preferred over remote allocation. Also we
definitely need a better pro-active defragmentation to allow larger
pages on a local node. This is a work in progress and this patch is a
stop gap fix.

--
Michal Hocko
SUSE Labs