Re: [PATCH 1/1] mm: vmscan: Reduce throttling due to a failure to make progress

From: Alexey Avramov
Date: Sat Nov 27 2021 - 14:29:05 EST


I will present the results of the new tests here.

TLDR;
=====
No one Mel's patch doesn't prevent stalls in my tests.

Test case:
Running firefox with youtube tab (basic desktop workload) and starting
$ for i in {1..10}; do tail /dev/zero; done
-- 1. with noswap and 2. with SwapTotal > MemTotal and swappiness=0.

I will log PSI metrics using latest psi2log (it's part of nohang [1]
package, src: [2]) and log some meminfo metrics using mem2log [3].

psi2log cmd:
$ psi2log -m 2 -i 1 -l /tmpfs/psi

mem2log cmd:
$ mem2log -i 1 -l /tmpfs/mem

The OS is Debian 9 x86_64 on HDD with ext4+LUKS, MemTotal: 11.5 GiB.

Repo with logs:
https://github.com/hakavlad/cache-tests/tree/main/516-reclaim-throttle

5.15
====

- With noswap, psi2log summary:

2021-11-27 20:51:14,018: Stall times for the last 141.6s:
2021-11-27 20:51:14,018: -----------
2021-11-27 20:51:14,018: some cpu 1.3s, avg 0.9%
2021-11-27 20:51:14,018: -----------
2021-11-27 20:51:14,018: some io 99.1s, avg 70.0%
2021-11-27 20:51:14,018: full io 93.4s, avg 66.0%
2021-11-27 20:51:14,018: -----------
2021-11-27 20:51:14,019: some memory 4.3s, avg 3.1%
2021-11-27 20:51:14,019: full memory 4.2s, avg 3.0%

-- average full memory stall is 0.4s per one `tail /dev/zero`.

full psi2log log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/515/noswap/psi

full mem2log log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/515/noswap/mem

- With swappiness=0 and SwapTotal > 0:

2021-11-27 20:57:30,217: Stall times for the last 141.6s:
2021-11-27 20:57:30,217: -----------
2021-11-27 20:57:30,217: some cpu 1.2s, avg 0.8%
2021-11-27 20:57:30,217: -----------
2021-11-27 20:57:30,217: some io 100.7s, avg 71.1%
2021-11-27 20:57:30,218: full io 94.8s, avg 67.0%
2021-11-27 20:57:30,218: -----------
2021-11-27 20:57:30,218: some memory 3.9s, avg 2.8%
2021-11-27 20:57:30,218: full memory 3.9s, avg 2.7%

-- 0.4s memory stall per one `tail /dev/zero`.

full psi2log log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/515/swappiness0/psi

full mem2log log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/515/swappiness0/mem


5.16-rc2 vanilla
================

- with noswap

psi2log summary:

2021-11-27 21:45:09,307: Stall times for the last 1086.5s:
2021-11-27 21:45:09,307: -----------
2021-11-27 21:45:09,307: some cpu 4.2s, avg 0.4%
2021-11-27 21:45:09,307: -----------
2021-11-27 21:45:09,307: some io 337.6s, avg 31.1%
2021-11-27 21:45:09,307: full io 325.2s, avg 29.9%
2021-11-27 21:45:09,307: -----------
2021-11-27 21:45:09,307: some memory 888.7s, avg 81.8%
2021-11-27 21:45:09,307: full memory 886.2s, avg 81.6%

average full memory stall is 87s per one `tail /dev/zero`.

full psi2log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/vanilla/noswap/psi

full mem2log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/vanilla/noswap/mem

- And even worst with swappiness=0:

2021-11-27 22:33:19,207: Stall times for the last 2677.7s:
2021-11-27 22:33:19,208: -----------
2021-11-27 22:33:19,208: some cpu 14.0s, avg 0.5%
2021-11-27 22:33:19,208: -----------
2021-11-27 22:33:19,208: some io 306.7s, avg 11.5%
2021-11-27 22:33:19,208: full io 292.4s, avg 10.9%
2021-11-27 22:33:19,208: -----------
2021-11-27 22:33:19,208: some memory 2504.8s, avg 93.5%
2021-11-27 22:33:19,209: full memory 2498.2s, avg 93.3%

What a horror!

In dmesg: a lot of
[] Purging GPU memory, 0 pages freed, 0 pages still pinned, 2112 pages left available.
(and dying journald)

psi2log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/vanilla/swappiness0/psi

mem2log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/vanilla/swappiness0/mem

dmesg:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/vanilla/dmesg

5.16-rc2 and 1st Mel's patch [4]
================================

- with noswap

Summary:

2021-11-27 23:28:23,707: Stall times for the last 1035.1s:
2021-11-27 23:28:23,707: -----------
2021-11-27 23:28:23,707: some cpu 5.6s, avg 0.5%
2021-11-27 23:28:23,708: -----------
2021-11-27 23:28:23,708: some io 336.5s, avg 32.5%
2021-11-27 23:28:23,708: full io 322.9s, avg 31.2%
2021-11-27 23:28:23,708: -----------
2021-11-27 23:28:23,708: some memory 851.1s, avg 82.2%
2021-11-27 23:28:23,708: full memory 848.3s, avg 82.0%

full psi2log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch1/noswap/psi

mem2log log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch1/noswap/mem

- with swappiness=0

Summary:

2021-11-27 23:50:25,722: Stall times for the last 1076.9s:
2021-11-27 23:50:25,722: -----------
2021-11-27 23:50:25,722: some cpu 12.2s, avg 1.1%
2021-11-27 23:50:25,722: -----------
2021-11-27 23:50:25,723: some io 232.1s, avg 21.6%
2021-11-27 23:50:25,723: full io 218.5s, avg 20.3%
2021-11-27 23:50:25,723: -----------
2021-11-27 23:50:25,723: some memory 957.5s, avg 88.9%
2021-11-27 23:50:25,723: full memory 951.2s, avg 88.3%

It's better than vanilla.

5.16-rc2 and 2nd Mel's patch [5]
================================

- with noswap

Summary:

2021-11-28 00:29:38,558: Stall times for the last 598.7s:
2021-11-28 00:29:38,558: -----------
2021-11-28 00:29:38,558: some cpu 5.2s, avg 0.9%
2021-11-28 00:29:38,558: -----------
2021-11-28 00:29:38,559: some io 235.0s, avg 39.3%
2021-11-28 00:29:38,559: full io 220.2s, avg 36.8%
2021-11-28 00:29:38,559: -----------
2021-11-28 00:29:38,559: some memory 439.0s, avg 73.3%
2021-11-28 00:29:38,559: full memory 437.5s, avg 73.1%

It's significant improvement, but far from the expected result.

mem:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch2/noswap/mem

psi:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch2/noswap/psi

In some cases the stall was quite short.

- with swappiness=0

Summary:

2021-11-28 00:53:41,659: Stall times for the last 1239.0s:
2021-11-28 00:53:41,659: -----------
2021-11-28 00:53:41,660: some cpu 9.4s, avg 0.8%
2021-11-28 00:53:41,660: -----------
2021-11-28 00:53:41,660: some io 206.5s, avg 16.7%
2021-11-28 00:53:41,660: full io 195.0s, avg 15.7%
2021-11-28 00:53:41,660: -----------
2021-11-28 00:53:41,660: some memory 1113.0s, avg 89.8%
2021-11-28 00:53:41,661: full memory 1107.1s, avg 89.3%

mem:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch2/swappiness0/mem

psi:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch2/swappiness0/psi


5.16-rc2 and 3rd Mel's patch [6] on top of 2nd
==============================================

- with noswap

Summary:

2021-11-28 01:35:26,077: Stall times for the last 949.4s:
2021-11-28 01:35:26,077: -----------
2021-11-28 01:35:26,078: some cpu 5.4s, avg 0.6%
2021-11-28 01:35:26,078: -----------
2021-11-28 01:35:26,078: some io 269.0s, avg 28.3%
2021-11-28 01:35:26,078: full io 259.4s, avg 27.3%
2021-11-28 01:35:26,078: -----------
2021-11-28 01:35:26,079: some memory 759.9s, avg 80.0%
2021-11-28 01:35:26,079: full memory 757.5s, avg 79.8%

mem:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch3/noswap/mem

psi:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch3/noswap/psi

- with swappiness=0

Summary:

2021-11-28 01:59:40,253: Stall times for the last 1127.8s:
2021-11-28 01:59:40,253: -----------
2021-11-28 01:59:40,253: some cpu 6.8s, avg 0.6%
2021-11-28 01:59:40,253: -----------
2021-11-28 01:59:40,253: some io 228.3s, avg 20.2%
2021-11-28 01:59:40,253: full io 218.7s, avg 19.4%
2021-11-28 01:59:40,253: -----------
2021-11-28 01:59:40,253: some memory 987.4s, avg 87.6%
2021-11-28 01:59:40,253: full memory 982.3s, avg 87.1%

mem log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch3/swappiness0/mem

psi log:
https://raw.githubusercontent.com/hakavlad/cache-tests/main/516-reclaim-throttle/516-rc2/patch3/swappiness0/psi

¯\_(ツ)_/¯

Should I test anything else?

[1] https://github.com/hakavlad/nohang
[2] https://raw.githubusercontent.com/hakavlad/nohang/master/src/psi2log
[3] https://github.com/hakavlad/mem2log
[4] https://lore.kernel.org/lkml/20211124115007.GG3366@xxxxxxxxxxxxxxxxxxx/
[5] https://lore.kernel.org/lkml/20211124143303.GH3366@xxxxxxxxxxxxxxxxxxx/
[6] https://lore.kernel.org/lkml/20211126162416.GK3366@xxxxxxxxxxxxxxxxxxx/