mm: kswapd struggles reclaiming the pages on 64GB server

From: Andriy Tkachuk
Date: Fri Aug 12 2016 - 16:52:26 EST


Hi,

our user-space application uses large amount of anon pages (private
mapping of the large file, more than 64GB RAM available in the system)
which are rarely accessible and are supposed to be swapped out.
Instead, we see that most of these pages are kept in memory while the
system suffers from the lack of free memory and overall performance
(especially the disk I/O, vm.swappiness=100 does not help it). kswapd
scans millions of pages per second but reclames hundreds per sec only.
Here are the 5 secs interval snapshots of some counters:

$ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
proc-meminfo-0616-160539.txt Cached: 347936 kB
proc-meminfo-0616-160549.txt Cached: 316316 kB
proc-meminfo-0616-160559.txt Cached: 322264 kB
proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
proc-vmstat-0616-160535.txt nr_active_anon 14508616
proc-vmstat-0616-160545.txt nr_active_anon 14513725
proc-vmstat-0616-160555.txt nr_active_anon 14515197
proc-vmstat-0616-160535.txt nr_inactive_anon 747407
proc-vmstat-0616-160545.txt nr_inactive_anon 744846
proc-vmstat-0616-160555.txt nr_inactive_anon 744509
proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
proc-vmstat-0616-160535.txt pgactivate 246016824
proc-vmstat-0616-160545.txt pgactivate 246033242
proc-vmstat-0616-160555.txt pgactivate 246042064
proc-vmstat-0616-160535.txt pgrefill_normal 22763262
proc-vmstat-0616-160545.txt pgrefill_normal 22768020
proc-vmstat-0616-160555.txt pgrefill_normal 22768178
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637

The pgrefill_normal and pgactivate counters show that only few
hundreds/sec pages move from active to inactive and vice versa lists -
that is comparable with what was reclaimed. So it looks like kswapd
scans the pages from inactive list mostly in kind of a loop and does
not even have a chance to look at the pages from the active list
(where most of the application's anon pages are located).

The kernel version: linux-3.10.0-229.14.1.el7.

Any ideas? Would be be useful to change inactive_ratio dynamically in
such a cases so that more pages could be moved from active to inactive
list and get a chance to be reclaimed? (Note: when application is
restarted - the problem disappears for a while (days) until the
correspondent number of privately mapped pages are dirtied again.)

Thank you,
Andriy