Re: [ANNOUNCE] 3.12.6-rt9

From: Mike Galbraith
Date: Fri Jan 17 2014 - 22:15:50 EST


On Fri, 2014-01-17 at 18:00 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2013-12-24 16:47:47 [+0100]:
>
> >I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
> >core box. I haven't seen RCU grip yet, but I just checked on it after
> >3.5 hours into this boot/beat (after fixing crash+kdump setup), and
> >found it in the process of dumping.
>
> So you also have the timers-do-not-raise-softirq-unconditionally.patch?

Oh dear, there's holidays, vacation, and massive turkey overdose between
then and now, but I'm almost positive that the tree was virgin $subject,
with only Paul's patch enabled, that being what I wanted to beat on.

> I have a small problem with understanding thisâ
>
> |#24 [ffff880273a03cd0] run_timer_softirq at ffffffff81069002
>
> Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
> init_lists() before the apic timer kicks in. So we have the wait_lock.

gdb fibs a little, we're acquiring.

>--- <IRQ stack> ---
> >#21 [ffff880273a03b28] apic_timer_interrupt at ffffffff815cbf9d
> > [exception RIP: _raw_spin_lock+50]

> In the hard interrupt triggered by the apic timer we get to
> get_next_timer_interrupt() and go again for same the wait_lock. Here we
> have the try_lock so we avoid this deadlock.
> The odd part: we get the lock. It should be the same lock because both use
> | struct tvec_base *base = __this_cpu_read(tvec_bases);
> to ge it. And we shouldn't get it because the lock is already hold.
> We get into trouble in the unlock path where we spin forever:
>
> |#14 [ffff880276803e50] rt_spin_unlock_after_trylock_in_irq at ffffffff815c3425
> |#12 [ffff880276803e28] _raw_spin_trylock at ffffffff815c3790
>
> which releases the lock with a trylock in order to keep lockdep happy.
> My understanding was that we should be able to obtain the wait_lock here
> since we were able to obtain it in the lock path and in irq off context
> there is nothing that could take the lock in the meantime.

IIRC, we were endlessly trying, but with an un-punched ticket under us,
and no Xen like evilness to save the day.

I've since cleaned out my crashdump directory and moved on to frolicking
with hotplug gremlins, so don't have that one to revisit, but the don't
unconditionally raise timer softirq patch is the bad guy.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/