Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

From: Sergey Senozhatsky
Date: Thu Jan 28 2016 - 05:52:47 EST


two small corrections.

On (01/28/16 19:41), Sergey Senozhatsky wrote:
[..]
> > Unfortunately, it's not reproduced anymore.
> >
> > If it's clearly a spinlock caller's bug as you said, modifying the
> > spinlock debug code does not help it at all. But I found there's a
> > possiblity in the debug code *itself* to cause a lockup. So I tried
> > to fix it. What do you think about it?
>
> ah... silly me... you mean the first CPU that triggers the spin_dump() will
^^^ this, of course, is true for
console_sem->lock and logbuf_lock
only.

> deadlock itself, so the rest of CPUs will see endless recursive
> spin_lock()->spin_dump()->spin_lock()->spin_dump() calls?
>
> like the one below?
>
>
> CPUZ is doing vprintk_emit()->spin_lock(), CPUA is the spin_lock's owner
>
> CPUZ -> vprintk_emit()
> __spin_lock_debug()
> for (i = 0; i < `loops_per_jiffy * HZ'; i++) { << wait for the lock
> if (arch_spin_trylock())
> return;
> __delay(1);
> }
> spin_dump() << lock is still owned by CPUA
> { -> vprintk_emit()
> __spin_lock_debug()
> for (...) {
> if (arch_spin_trylock())
> return;
> __delay(1);
> }
- << CPUA unlocked the lock
> spin_dump()
> { -> vprintk_emit()
> __spin_lock_debug()
the "<< CPUA unlocked the lock" line better be here. to make it correct.

+ << CPUA unlocked the lock
> for (...) {
> if (arch_spin_trylock()) << success!!
> /* CPUZ now owns the lock */
> return;
> }
> }
>
> << we return here with the spin_lock being owned by this CPUZ
>
> trigger_all_cpu_backtrace()
>
> << and... now it does the arch_spin_lock()
> /*
> * The trylock above was causing a livelock. Give the lower level arch
> * specific lock code a chance to acquire the lock. We have already
> * printed a warning/backtrace at this point. The non-debug arch
> * specific code might actually succeed in acquiring the lock. If it is
> * not successful, the end-result is the same - there is no forward
> * progress.
> */
> arch_spin_lock(&lock->raw_lock);
>
> << which obviously dealocks this CPU...
> }
>
> trigger_all_cpu_backtrace()
>
> arch_spin_lock()
>
>
>
>
> so
> "the CPUZ is now keeping the lock forever, and not going to release it"
> and
> "CPUA-CPUX will do vprintk_emit()->spin_lock()->spin_dump()->vprintk_emit()->..."
>
>
>
> My apologies for not getting it right the first time. Sorry!
>
> Can you please update your bug description in the commit message?
> It's the deadlock that is causing the recursion on other CPUs in the
> first place.

-ss