Re: [PATCH] mm,oom: Bring OOM notifier callbacks to outside of OOM killer.

From: Paul E. McKenney
Date: Sat Jun 30 2018 - 13:17:09 EST


On Fri, Jun 29, 2018 at 11:35:48PM +0900, Tetsuo Handa wrote:
> On 2018/06/29 21:52, Paul E. McKenney wrote:
> > The effect of RCU's current OOM code is to speed up callback invocation
> > by at most a few seconds (assuming no stalled CPUs, in which case
> > it is not possible to speed up callback invocation).
> >
> > Given that, I should just remove RCU's OOM code entirely?
>
> out_of_memory() will start selecting an OOM victim without calling
> get_page_from_freelist() since rcu_oom_notify() does not set non-zero
> value to "freed" field.
>
> I think that rcu_oom_notify() needs to wait for completion of callback
> invocations (possibly with timeout in case there are stalling CPUs) and
> set non-zero value to "freed" field if pending callbacks did release memory.

Waiting for the callbacks is easy. Timeouts would be a bit harder, but
still doable. I have no idea how to tell which callbacks freed memory
and how much -- all RCU does is invoke a function, and that function
can do whatever its developer wants.

> However, what will be difficult to tell is whether invocation of pending callbacks
> did release memory. Lack of last second get_page_from_freelist() call after
> blocking_notifier_call_chain(&oom_notify_list, 0, &freed) forces rcu_oom_notify()
> to set appropriate value (i.e. zero or non-zero) to "freed" field.
>
> We have tried to move really last second get_page_from_freelist() call to inside
> out_of_memory() after blocking_notifier_call_chain(&oom_notify_list, 0, &freed).
> But that proposal was not accepted...
>
> We could move blocking_notifier_call_chain(&oom_notify_list, 0, &freed) to
> before last second get_page_from_freelist() call (and this is what this patch
> is trying to do) which would allow rcu_oom_notify() to always return 0...
> or update rcu_oom_notify() to use shrinker API...

Would it be possible to tell RCU that memory was starting to get tight
with one call, and then tell it that things are OK with another call?
That would make much more sense from an RCU perspective.

Thanx, Paul