Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

From: Xing Zhengjun
Date: Fri Aug 21 2020 - 04:39:39 EST




On 6/26/2020 5:33 AM, Mike Kravetz wrote:
On 6/22/20 3:01 PM, Mike Kravetz wrote:
On 6/21/20 5:55 PM, kernel test robot wrote:
Greeting,

FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:


commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:

runtime: 300s
size: 8T
test: anon-cow-seq-hugetlb
cpufreq_governor: performance
ucode: 0x11


Some performance regression is not surprising as the change includes acquiring
and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
seems a bit high. But, the test is primarily exercising the hugetlb page
fault path and little else.

The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
invalidating the pmd we are operating on. This specific test case is operating
on anonymous private mappings. So, PMD sharing is not possible and we can
eliminate acquiring the mutex in this case. In fact, we should check all
mappings (even sharable) for the possibly of PMD sharing and only take the
mutex if necessary. It will make the code a bit uglier, but will take care
of some of these regressions. We still need to take the mutex in the case
of PMD sharing. I'm afraid a regression is unavoidable in that case.

I'll put together a patch.

Not acquiring the mutex on faults when sharing is not possible is quite
straight forward. We can even use the existing routine vma_shareable()
to easily check. However, the next patch in the series 87bf91d39bb5
"hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
on always acquiring the mutex. If we break this assumption, then the
code to back out hugetlb reservations needs to be written. A high level
view of what needs to be done is in the commit message for 87bf91d39bb5.

I'm working on the code to back out reservations.


I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I test with the patch, the regression reduced to 10.1%, do you have plan to continue to improve it? Thanks.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:

lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
49aef7175cc6eb703a9280a7b830e675fe8f2704
c0d0381ade79885c04a04c303284b040616b116e
v5.8
34ae204f18519f0920bd50a644abd6fefc8dbfcf
v5.9-rc1

49aef7175cc6eb70 c0d0381ade79885c04a04c30328 v5.8 34ae204f18519f0920bd50a644a v5.9-rc1
---------------- --------------------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \ | \
38084 -31.1% 26231 ± 2% -26.6% 27944 ± 5% -7.0% 35405 -7.5% 35244 vm-scalability.median
9.92 ± 9% +12.0 21.95 ± 4% +3.9 13.87 ± 30% -5.3 4.66 ± 9% -6.6 3.36 ± 7% vm-scalability.median_stddev%
12827311 -35.0% 8340256 ± 2% -30.9% 8865669 ± 5% -10.1% 11532087 -10.2% 11513595 ± 2% vm-scalability.throughput
2.507e+09 -22.7% 1.938e+09 -15.3% 2.122e+09 ± 6% +8.0% 2.707e+09 +8.0% 2.707e+09 ± 2% vm-scalability.workload



--
Zhengjun Xing