Re: [PATCH v2 0/4] Mitigate inconsistent NUMA imbalance behaviour

From: Mel Gorman
Date: Wed May 25 2022 - 08:49:47 EST


On Tue, May 24, 2022 at 06:01:07PM +0200, Vincent Guittot wrote:
> > This is the min, max and range of run time for mg.D parallelised with ~25%
> > of the CPUs parallelised by MPICH running on a 2-socket machine (80 CPUs,
> > 16 active for mg.D due to limitations of mg.D).
> >
> > v5.3 Min 95.84 Max 96.55 Range 0.71 Mean 96.16
> > v5.7 Min 95.44 Max 96.51 Range 1.07 Mean 96.14
> > v5.8 Min 96.02 Max 197.08 Range 101.06 Mean 154.70
> > v5.12 Min 104.45 Max 111.03 Range 6.58 Mean 105.94
> > v5.13 Min 104.38 Max 170.37 Range 65.99 Mean 117.35
> > v5.13-revert-c6f886546cb8 Min 104.40 Max 110.70 Range 6.30 Mean 105.68
> > v5.18rc4-baseline Min 110.78 Max 169.84 Range 59.06 Mean 131.22
> > v5.18rc4-revert-c6f886546cb8 Min 113.98 Max 117.29 Range 3.31 Mean 114.71
> > v5.18rc4-this_series Min 95.56 Max 163.97 Range 68.41 Mean 105.39
> > v5.18rc4-this_series-revert-c6f886546cb8 Min 95.56 Max 104.86 Range 9.30 Mean 97.00
>
> I'm interested to understand why such instability can be introduced by
> c6f886546cb8 as it aims to do the opposite by not waking up a random
> idle cpu but using the current cpu which is becoming idle, instead. I
> haven't been able to reproduce your problem with my current setup but
> I assume this is specific to some use cases so I will try to reproduce
> the mg.D test above. If you have more details on the setup to ease the
> reproduction of the problem I'm interested.
>

Thanks Vincent,

The most straight-forward way to reproduce is via mmtests.

# git clone https://github.com/gormanm/mmtests/
# cd mmtests
# ./bin/generate-generic-configs
# ./run-mmtests.sh --run-monitor --config configs/config-hpc-nas-mpich-quarter-mgD-many test-mgD-many
# cd work/log
# ../../compare-kernels.sh

nas-mpich-mg NAS Time
test
mgD-many
Min mg.D 95.80 ( 0.00%)
Amean mg.D 110.77 ( 0.00%)
Stddev mg.D 21.55 ( 0.00%)
CoeffVar mg.D 19.46 ( 0.00%)
Max mg.D 155.35 ( 0.00%)
BAmean-50 mg.D 96.05 ( 0.00%)
BAmean-95 mg.D 107.83 ( 0.00%)
BAmean-99 mg.D 109.23 ( 0.00%)

Note the min of 95.80 seconds, max of 155.35 and high stddev indicating
the results are not stable.

The generated config is for openSUSE so it may not work for you. After
installing the mpich package, you'll need to adjust these lines

export NAS_MPICH_PATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/bin
export NAS_MPICH_LIBPATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/$MMTESTS_LIBDIR

NAS_MPICH_PATH and NAS_MPICH_LIBPATH need to point to the bin and lib
path for the mpich package your distribution ships.

--
Mel Gorman
SUSE Labs