RE: [PATCH] mm/vmscan: add vm_swappiness configuration knobs

From: Ivan Teterevkov
Date: Thu Mar 12 2020 - 08:55:00 EST


On Thurs, 12 Mar 2020, Michal Hocko wrote:

> On Wed 11-03-20 17:45:58, Ivan Teterevkov wrote:
> > This patch adds a couple of knobs:
> >
> > - The configuration option (CONFIG_VM_SWAPPINESS).
> > - The command line parameter (vm_swappiness).
> >
> > The default value is preserved, but now defined by CONFIG_VM_SWAPPINESS.
> >
> > Historically, the default swappiness is set to the well-known value
> > 60, and this works well for the majority of cases. The vm_swappiness
> > is also exposed as the kernel parameter that can be changed at runtime too,
> e.g.
> > with sysctl.
> >
> > This approach might not suit well some configurations, e.g.
> > systemd-based distros, where systemd is put in charge of the cgroup
> > controllers, including the memory one. In such cases, the default
> > swappiness 60 is copied across the cgroup subtrees early at startup,
> > when systemd is arranging the slices for its services, before the
> > sysctl.conf or tmpfiles.d/*.conf changes are applied.
> >
> > One could run a script to traverse the cgroup trees later and set the
> > desired memory.swappiness individually in each occurrence when the
> > runtime is set up, but this would require some amount of work to
> > implement properly. Instead, why not set the default swappiness as early as
> possible?
>
> I have to say I am not a great fan of more tunning for swappiness as this is quite
> a poor tunning for many years already. It essentially does nothing in many cases
> because the reclaim process ignores to value in many cases (have a look a
> get_scan_count. I have seen quite some reports that setting a specific value for
> vmswappiness didn't make any change. The knob itself has a terrible semantic to
> begin with because there is no way to express I really prefer to swap rather than
> page cache reclaim.
>
> This all makes me think that swappiness is a historical mistake that we should
> rather make obsolete than promote even further.

Absolutely agree, the semantics of the vm_swappiness is perplexing.
Moreover, the same get_scan_count treats vm_swappiness and cgroups
memory.swappiness differently, in particular, 0 disables the memcg swap.

Certainly, the patch adds some additional exposure to a parameter that
is not trivial to tackle but it's already getting created with a magic
number which is also confusing. Is there any harm to be done by the patch
considering the already existing sysctl interface to that knob?

> --
> Michal Hocko
> SUSE Labs