Re: [RFC] Make need_resched() return true when rcu_urgent_qs requested

From: Peter Zijlstra
Date: Mon Jul 09 2018 - 07:32:09 EST


On Mon, Jul 09, 2018 at 12:12:15PM +0100, David Woodhouse wrote:
> On Mon, 2018-07-09 at 13:06 +0200, Peter Zijlstra wrote:
> > On Mon, Jul 09, 2018 at 11:56:41AM +0100, David Woodhouse wrote:

> > > > But either proposal is exactly the same in this respect. The whole
> > > > rcu_urgent_qs thing won't be set any earlier either.

> > > Er.... Marius, our latencies in expand_fdtable() definitely went from
> > > ~10s to well below one second when we just added the rcu_all_qs() into
> > > the loop, didn't they? And that does nothing if !rcu_urgent_qs.

> > Argh I never found that, because obfuscation:
> >
> > ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
> > ...
> > smp_store_release(ruqp, true);
> >
> > I, using git grep "rcu_urgent_qs.*true" only found
> > rcu_request_urgent_qs_task() and sync_sched_exp_handler().
> >
> > But how come KVM even triggers that case; rcu_implicit_dynticks_qs() is
> > for NOHZ and offline CPUs.
>
> I don't know that it is; I'm merely going by the empirical observation
> that with a check for rcu_urgent_qs in the vcpu_run() loop, KVM is no
> longer screwing over synchronize_sched() for 10 seconds at a time. Or
> even 1 second at a time.

It would be good to know what exactly sets that variable in your case.

> I'm all for considering a CPU in guest mode to be quiescent, and not
> waiting for it at all. But we don't do that without full NOHZ even for
> CPUs in userspace.

Doing it for guests should be easier than for userspace, since
vmenter/vmexit are (afaik) _much_ more expensive than sysenter/sysexit.