Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure

From: Daniel Drake
Date: Tue Aug 20 2019 - 02:46:28 EST


Hi,

Artem S. Tashkinov wrote:
> Once you hit a situation when opening a new tab requires more RAM than
> is currently available, the system will stall hard. You will barely be
> able to move the mouse pointer. Your disk LED will be flashing
> incessantly (I'm not entirely sure why). You will not be able to run new
> applications or close currently running ones.
>
> This little crisis may continue for minutes or even longer. I think
> that's not how the system should behave in this situation. I believe
> something must be done about that to avoid this stall.

Thanks for reviving this discussion. Indeed, this is a real pain point in
the Linux experience.

For Endless, we sunk some time into this and emerged with psi being the best
solution we could find. The way it works on a time basis seems very appropriate
when what we're ultimately interested in is maintaining desktop UI interactivity.
With psi enabled in the kernel, we add a small userspace daemon to kill a process
when psi reports that *all* userspace tasks are being blocked on kernel memory
management work for (at least) 1 second in a 10 second period.

https://github.com/endlessm/eos-boot-helper/blob/master/psi-monitor/psi-monitor.c

To share our results so far, despite this daemon being a quick initial
implementation, we find that it is bringing excellent results, no more memory
pressure hangs. The system recovers in less than 30 seconds, usually in more
like 10-15 seconds. Sadly a process got killed along the way, but that's a lot
better than the user having no option other than pulling the plug.
The system may not always recover to a totally smooth state, but the
responsiveness to mouse movements and clicks is still decent, so at that point
the user can close some more windows to restore full UI performance again.

There's just one issue we've seen so far: a single report of psi reporting
memory pressure on a desktop system with 4GB RAM which is only running
the normal desktop components plus a single gmail tab in the web browser.
psi occasionally reports high memory pressure, so then psi-monitor steps in and
kills the browser tab, which seems erroneous. We haven't had a chance to look at
this in detail yet. Here's a log from the kernel OOM killer showing the memory and
process state at this point.
https://gist.github.com/dsd/b338bab0206dcce78263f6bb87de7d4a

> I'm almost sure some sysctl parameters could be changed to avoid this
> situation but something tells me this could be done for everyone and
> made default because some non tech-savvy users will just give up on
> Linux if they ever get in a situation like this and they won't be keen
> or even be able to Google for solutions.

As you anticipated, myself and others already jumped in with solutions
appropriate for tech-savvy users. Getting solutions widely deployed is indeed
another important aspect to tackle.

If you're curious to see how this can look from a "just works" standpoint, you
might be interested in downloading Endless (www.endlessos.com) and running your
tests there; we have the above solution running and active out of the box.

Bastien Nocera has recently adapted and extended our solution, presumably
with an eye towards getting this more widely deployed as a standard part
of the Linux desktop.
https://gitlab.freedesktop.org/hadess/low-memory-monitor/

And if there is a meaningful way to make the kernel behave better, that would
obviously be of huge value too.

Thanks
Daniel