Re: [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO

From: Johannes Weiner
Date: Tue May 29 2018 - 14:14:17 EST


Hi Suren,

On Fri, May 25, 2018 at 05:29:30PM -0700, Suren Baghdasaryan wrote:
> Hi Johannes,
> I tried your previous memdelay patches before this new set was posted
> and results were promising for predicting when Android system is close
> to OOM. I'm definitely going to try this one after I backport it to
> 4.9.

I'm happy to hear that!

> Would it make sense to split CONFIG_PSI into CONFIG_PSI_CPU,
> CONFIG_PSI_MEM and CONFIG_PSI_IO since one might need only specific
> subset of this feature?

Yes, that should be doable. I'll split them out in the next version.

> > The total= value gives the absolute stall time in microseconds. This
> > allows detecting latency spikes that might be too short to sway the
> > running averages. It also allows custom time averaging in case the
> > 10s/1m/5m windows aren't adequate for the usecase (or are too coarse
> > with future hardware).
>
> Any reasons these specific windows were chosen (empirical
> data/historical reasons)? I'm worried that with the smallest window
> being 10s the signal might be too inert to detect fast memory pressure
> buildup before OOM kill happens. I'll have to experiment with that
> first, however if you have some insights into this already please
> share them.

They were chosen empirically. We started out with the loadavg window
sizes, but had to reduce them for exactly the reason you mention -
they're way too coarse to detect acute pressure buildup.

10s has been working well for us. We could make it smaller, but there
is some worry that we don't have enough samples then and the average
becomes too erratic - whereas monitoring total= directly would allow
you to detect accute spikes and handle this erraticness explicitly.

Let me know how it works out in your tests.

Thanks for your feedback.