Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

From: Peter Zijlstra
Date: Wed May 27 2020 - 05:56:59 EST


On Tue, May 26, 2020 at 06:11:01PM +0200, Peter Zijlstra wrote:
> Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG
> to avoid sending IPIs to idle CPUs.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/sched/core.c | 10 ++++++++++
> kernel/sched/idle.c | 1 +
> kernel/sched/sched.h | 2 ++
> kernel/smp.c | 16 +++++++++++++++-
> 4 files changed, 28 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2296,6 +2296,16 @@ static void wake_csd_func(void *info)
> sched_ttwu_pending();
> }
>
> +void send_call_function_single_ipi(int cpu)
> +{
> + struct rq *rq = cpu_rq(cpu);
> +
> + if (!set_nr_if_polling(rq->idle))
> + arch_send_call_function_single_ipi(cpu);
> + else
> + trace_sched_wake_idle_without_ipi(cpu);
> +}
> +
> /*
> * Queue a task on the target CPUs wake_list and wake the CPU via IPI if
> * necessary. The wakee CPU on receipt of the IPI will queue the task
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -289,6 +289,7 @@ static void do_idle(void)
> */
> smp_mb__after_atomic();
>
> + flush_smp_call_function_from_idle();
> sched_ttwu_pending();
> schedule_idle();
>

Paul; the above patch basically allows smp_call_function_single() to run
from the idle context (with IRQs disabled, obviously) instead of from an
actual IRQ context.

This makes RCU unhappy (as reported by mingo):

[ ] ------------[ cut here ]------------
[ ] Not in hardirq as expected
[ ] WARNING: CPU: 4 PID: 0 at kernel/rcu/tree.c:430 rcu_is_cpu_rrupt_from_idle+0xed/0x110
[ ] Modules linked in:
[ ] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc7-00840-ga61d572-dirty #1
[ ] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 2.0b 03/01/2012
[ ] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xed/0x110
[ ] Call Trace:
[ ] rcu_exp_handler+0x38/0x90
[ ] flush_smp_call_function_queue+0xce/0x230
[ ] flush_smp_call_function_from_idle+0x2f/0x60
[ ] do_idle+0x163/0x260
[ ] cpu_startup_entry+0x19/0x20
[ ] start_secondary+0x14f/0x1a0
[ ] irq event stamp: 189300
[ ] hardirqs last enabled at (189299): [<ffffffff811d3e25>] tick_nohz_idle_exit+0x55/0xb0
[ ] hardirqs last disabled at (189300): [<ffffffff811da5f5>] flush_smp_call_function_from_idle+0x25/0x60
[ ] softirqs last enabled at (189284): [<ffffffff811280a0>] irq_enter_rcu+0x70/0x80
[ ] softirqs last disabled at (189283): [<ffffffff81128085>] irq_enter_rcu+0x55/0x80

This is rcu_is_cpu_rrupt_from_idle()'s lockdep_assert_in_irq() tripping
up (it's comment is obviously a bit antiquated).

Now, if I read that code correctly, it actually relies on
rcu_irq_enter() and thus really wants to be in an interrupt. Is there
any way this code can be made to work from the new context too?

After all, all that really is different is not having gone throught he
bother of setting up the IRQ context, nothing else changed -- it just so
happens you actually relied on that ;/