Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

From: Peter Zijlstra
Date: Thu Dec 01 2016 - 13:10:05 EST


On Thu, Dec 01, 2016 at 08:59:18AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 01, 2016 at 05:36:14PM +0100, Peter Zijlstra wrote:
> > Well, with the above change cond_resched() is already sufficient, no?
>
> Maybe. Right now, cond_resched_rcu_qs() gets a quiescent state to
> the RCU core in less than one jiffy, with my other change, this becomes
> a handful of jiffies depending on HZ and NR_CPUS. I expect this
> increase to a handful of jiffies to be a non-event.
>
> After my upcoming patch, cond_resched() will get a quiescent state to
> the RCU core in about ten seconds. While I am am not all that nervous
> about the increase from less than a jiffy to a handful of jiffies,
> increasing to ten seconds via cond_resched() does make me quite nervous.
> Past experience indicates that someone's kernel will likely be fatally
> inconvenienced by this magnitude of change.
>
> Or am I misunderstanding what you are proposing?

No, that is indeed what I was proposing. Hurm.. OK let me ponder that a
bit. There might be a few games we can play with !PREEMPT to avoid IPIs.

Thing is, I'm slightly uncomfortable with de-coupling rcu-sched from
actual schedule() calls.

> > In fact, by doing the IPI thing we get the entire cond_resched*()
> > family, and we could add the should_resched() guard to
> > cond_resched_rcu().
>
> So that cond_resched_rcu_qs() looks something like this, in order
> to avoid the function call in the case where the scheduler has nothing
> to do?

I was actually thinking of this:

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2d0c82e1d348..2dc7d8056b2a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3374,9 +3374,11 @@ static inline int signal_pending_state(long state, struct task_struct *p)
static inline void cond_resched_rcu(void)
{
#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
- rcu_read_unlock();
- cond_resched();
- rcu_read_lock();
+ if (should_resched(1)) {
+ rcu_read_unlock();
+ cond_resched();
+ rcu_read_lock();
+ }
#endif
}