Re: [PATCH] mm/vmalloc: lock contention optimization under multi-threading

From: rulinhuang
Date: Fri Feb 09 2024 - 06:49:52 EST


Hi Rezki, thanks so much for your review. Exactly, your suggestions
could effectively enhance the maintainability and readability of this
code change as well as the whole vmalloc implementation. To avoid the
partial initialization issue, we are trying to refine this patch by
separating insert_map_area(), the insertion of VA into the tree, from
alloc_map_area(), so that setup_vmalloc_vm() could be invoked between
them. However, our initial trial ran into a boot-time error, which we
are still debugging, and it may take a little bit longer than expected
as the coming week is the public holiday of Lunar New Year in China.
We will share with you the latest version of patch once ready for your
review.
In the performance test, we firstly build stress-ng by following the
instructions from https://github.com/ColinIanKing/stress-ng, and then
launch the stressor for pthread (--pthread) for 30 seconds (-t 30) via
the below command:
/stress-ng -t 30 --metrics-brief --pthread -1 --no-rand-seed
The aggregated count of spawned threads per second (Bogo ops/s) is
taken as the score of this workload. We evaluated the performance
impact of this patch on the Ice Lake server with 40, 80, 120 and 160
online cores respectively. And as is shown in below figure, with
the expansion of online cores, this patch could relieve the
increasingly severe lock contention and achieve quite significant
performance improvement of around 5.5% at 160 cores.

vcpu number 40 80 120 160
patched/original 100.5% 100.8% 105.2% 105.5%

Thanks again for your help and please let us know if more details
are needed.