Re: [GIT pull] timer updates for 4.9

From: Thomas Gleixner
Date: Mon Oct 24 2016 - 15:13:01 EST


On Mon, 24 Oct 2016, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:51 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > Can you please check in the disassembly whether gcc really reloads
> > timer->flags? Mine does not...
>
> No, me neither. The code generation for lock_timer_base() looks
> reasonable, although not pretty (it needs one spill for the
> complexities in get_timer_cpu_base(), and the "*flags" games results
> in some unnecessary indirection too).
>
> I will try your patch, but also stare at my code some more.
>
> I'm starting to think that the problem could be due to the timer code
> being triggered _way_ too early (printk() ends up being obviously used
> long before most things end up using timers), and that the problem I
> see is just later fallout from that.
>
> Sergey (added to participants) tried an earlier version of my patch,
> and had more debug options enabled, and got
>
> BUG: spinlock bad magic on CPU#0
>
> from mod_timer() doing _raw_spin_unlock_irqrestore(), when the

Weird, that should have triggered in raw_spin_lock() already.

Can you bounce me the patch you are currently testing?

> printk() callchain happens very early in setup_arch ->
> setup_memory_map -> e820_print_map().
>
> So I think the timer bugs I found were _potentially_ true bugs, but
> likely not the cause of this all.
>
> init_timers() happens early, but we do printk's even earlier.

These are the things which are not initialized:

1) base->spinlock

That's a non issue for !debug kernels as the lock initializer is 0
(unlocked).

2) base->clk

That makes the timer queued at some random array bucket.

3) base->cpu

That's a non issue as base->cpu is 0 and at this point you are on CPU 0
and the stupid NOHZ remote queueing is not yet possible.

The hlist_head is not touched by init_timers() as it's NULL initialized
already, so we do not scribble over an already queued timer.

So anything you queue _before_ init_timers() will just be queued to some
random bucket, but it does not explain the wreckage you are seing.

Thanks,

tglx