Re: [RFC PATCH v2 15/17] sched: Trivial forced-newidle balancer

From: Vineeth Remanan Pillai
Date: Wed Apr 24 2019 - 10:06:06 EST


> try_steal_cookie() is in the loop of for_each_cpu_wrap().
> The root domain could be large and we should avoid
> stealing cookie if source rq has only one task or dst is really busy.
>
> The following patch eliminated a deadlock issue on my side if I didn't
> miss anything in v1. I'll double check with v2, but it at least avoids
> unnecessary irq off/on and double rq lock. Especially, it avoids lock
> contention that the idle cpu which is holding rq lock in the progress
> of load_balance() and tries to lock rq here. I think it might be worth to
> be picked up.
>

The dst->nr_running is actually checked in queue_core_balance with the
lock held. Also, try_steal_cookie checks if dst is running idle, but
under the lock. Checking whether src is empty makes sense, but shouldn't
it be called under the rq lock? Couple of safety and performance checks
are done before calling try_steal_cookie and hence, I hope double lock
would not cause a major performance issue.

If the hard lockup is reproducible with v2, could you please share more
details about the lockup?

Thanks