Re: [PATCH V2] smp: Give WARN()ing when calling smp_call_function_many()/single()in serving irq

From: Max Filippov
Date: Thu Dec 05 2013 - 20:29:34 EST


Hi Thomas,

On Fri, Jul 5, 2013 at 6:37 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 5 Jul 2013, Thomas Gleixner wrote:
>> On Sat, 16 Feb 2013, Chuansheng Liu wrote:
>> > Currently the functions smp_call_function_many()/single() will
>> > give a WARN()ing only in the case of irqs_disabled(), but that
>> > check is not enough to guarantee execution of the SMP
>> > cross-calls.
>> >
>> > In many other cases such as softirq handling/interrupt handling,
>> > the two APIs still can not be called, just as the
>> > smp_call_function_many() comments say:
>> >
>> > * You must not call this function with disabled interrupts or from a
>> > * hardware interrupt handler or from a bottom half handler. Preemption
>> > * must be disabled when calling this function.
>> >
>> > There is a real case for softirq DEADLOCK case:
>> >
>> > CPUA CPUB
>> > spin_lock(&spinlock)
>> > Any irq coming, call the irq handler
>> > irq_exit()
>> > spin_lock_irq(&spinlock)
>> > <== Blocking here due to
>> > CPUB hold it
>> > __do_softirq()
>> > run_timer_softirq()
>> > timer_cb()
>> > call smp_call_function_many()
>> > send IPI interrupt to CPUA
>> > wait_csd()
>> >
>> > Then both CPUA and CPUB will be deadlocked here.
>>
>> That's not true if called with wait = 0 as we won't wait for the csd
>> in that case. The function will be invoked on cpuA after it reenables
>> interrupt. So for callers who don't care about synchronous execution
>> it should not warn in softirq context.
>
> Hmm, even there it matters, because of the following scenario:
>
> CPU 0
> smp_call_function_single(CPU 1)
> csd_lock(CPU 1)
> irq_enter()
> irq_exit()
> __do_softirq()
> smp_call_function_many()
> setup csd (CPU 1)
> csd_lock(CPU 1) ==> CPU 0 deadlocked itself.
>
> And this is even more likely to happen than the lock issue.

I've observed similar deadlock in a real system which has network
driver that uses smp_call_function_single in the softirq context.

The proposed fix below keeps IRQs disabled on the sending CPU
during the period between marking csd locked and sending IPI,
making it possible to use smp_call_function_single from the softirq
context. What do you think?

--->8---